# Workshop Schedule: Machine Learning-Assisted Sampling for Scientific Computing – Applications in Physics

Historically, statistical mechanics has strongly influenced probabilistic modeling in Machine Learning (e.g. energy based models, diffusion based models …). The goal of this workshop is to explore how deep generative modelling impressive abilities can help in inference in large probabilistic models (e.g. building maps from simple to non-trivial distributions). This covers many applications: Bayesian inference speed up, MCMC acceleration, characterization of phase transitions, wave function manipulations, etc.

The schedule of this week includes two days of invited lectures which are open to all, followed by a colloquium and discussions between experts. More information about each day’s program, and what topics will be presented by whom, is presented below.

Schedule | Monday |
Tuesday |
---|---|---|

Gabriel Peyré - Presentation of the AISSAI Center (9:00) |
||

9:00 - 9:30 |
Pierre Monmarché | Marylou Gabrié (9:15) |

9:45 - 10:15 |
Jutta Rogal | Alain Durmus (10:00) |

10:30-11:00 |
Pause-café | Pause-café (10:45) |

11:00-11:30 |
Wei Zhang |
Colloquium : Freddy Bouchet |

11:45-12:30 |
Maria Cameron | |

12:30-14:00 |
Lunch Break | Lunch Break |

14:00-14:30 |
Jonathan Weare | Stéphane Mallat |

14:45-15:15 |
David Aristoff | Arnaud Doucet |

15:30-16:00 |
Coffee Break | Coffee Break |

16:00-16:30 |
Peter Bolhuis | Martin Weigt |

16:45-17:30 |
Gabriel Stoltz | Ryan Abbott |

__Monday October 3rd 2022__

**9:00-9:30**

Pierre Monmarché (Université Pierre-et-Marie-Curie)

*An application of adaptive reaction coordinates to the SARS-CoV-2 main protease*

To tackle the problem of metastability when sampling high-dimensional molecular systems, many importance sampling schemes rely on the knowledge of good reaction coordinates (RC), i.e. a low-dimensional representation of the system. On the other hand, as can be seen in the rest of this workshop, the question of learning good RC given a sample of the system has recently drawn much interest. Hence, it is natural to combine these two parts and to learn on-the-fly some RC while sampling with dynamics biased through these adaptive RC. We will present how a very basic instance of this method yielded good results to sample a key part of the SARS-CoV-2 virus in 2020.

**9:45-10:05**

Jutta Rogal (New York University)

*Pathways in classification space - Machine learning collective variables for enhanced sampling of structural transformations*

Microscopic processes governing polymorphic transitions can be highly complex and are non-trivial to sample with molecular simulations. Due to the complexity of the transformation mechanisms, it is often difficult to suggest suitable collective variables that can be used in enhanced sampling methods. Here, we derive collective variables based on a machine learning classification approach of local structural environments. This local information is combined into global classifiers that are used in the enhanced sampling, which allows us to drive global phase transformations through changes in local structural motifs. One key advantage of this approach is that to train the classification model, only information within the stable states but not of the transition itself are required. We exemplify our approach by sampling the migration of a phase boundary during a polymorphic transition.

**10:30-11:00**

Coffee Break

**11:00-11:30**

Wei Zhang (Zuse Institute Berlin)

*A deep learning-based algorithm for solving PDE eigenvalue problems of metastable diffusion processes*

Understanding the dynamics of metastable systems, e.g., biomolecules, on large timescales is a challenging task. In this talk, I discuss the approach that helps tackle this task in the case of overdamped Langevin dynamics by studying the PDE eigenvalue problem of the infinitesimal generator of the process. In analogy to the Koopman operator approach where eigenfunctions are special in that they lead to optimal linear Koopman models, here we show that the eigenfunctions of the generator lead to the optimal effective dynamics that preserves the corresponding timescales. I will present a deep learning-based algorithm for computing the leading eigenvalues and eigenfunctions of the PDE eigenvalue problem. The capability of the algorithm will be demonstrated on concrete examples. The possible extension of the algorithm to transfer operator and connection to popular methods such as state-free reversible VAMPnets will be discussed.

**11:45-12:30**

Maria Cameron (Université de Maryland)

*Quantifying rare events with the aid of diffusion maps*

The study of phenomena such as protein folding and conformational changes in molecules is a central theme in chemical physics. Molecular dynamics (MD) simulation is the primary tool for the study of transition processes in biomolecules, but it is hampered by a huge timescale gap between the processes of interest and atomic vibrations which dictate the time step size. Therefore, it is imperative to combine MD simulations with other techniques in order to quantify the transition processes taking place on large timescales.

The diffusion map algorithm introduced by Coifman and Lafon in 2006 as a nonlinear dimensional reduction tool with proven theoretical guarantees has an important ability to approximate differential operators on point clouds. We show that by changing the kernel function inherent in diffusion maps and using renormalizations one can approximate the Backward Kolmogorov Operator for the stochastic differential equation governing the dynamics of biomolecules or atomic clusters described in collective variables: time-reversible dynamics with position-dependent and anisotropic diffusion. Moreover, the point cloud used as an input does not need to be sampled from the invariant density but can be generated by any standard enhanced sampling algorithm such as metadynamics. Using the solution to the Backward Kolmogorov PDE on point cloud with appropriate boundary conditions one can identify reaction channels and calculate the transition rate between metastable states of interest following the framework of Transition Path Theory (E and Vanden-Eijnden, 2006).

We test the proposed approach on a number of benchmark examples and apply it to alanine dipeptide in a collective variable space consisting of four dihedral angles. The transition rate that we find is in good agreement with the ground truth rate found by running an extremely long unbiased trajectory (Vani et al., 2022).

**12:30-14:00**

Lunch Break

**14:00-14:30**

Jonathan Weare (New York University)

*Forecasting long timescale events using short trajectory data*

Events that occur on long timescales, such as the most extreme weather and climate events or large scale conformational changes in biomolecules, are often the most interesting and impactful features of a dynamical system. Because they occur only very infrequently, they are difficult to analyse using either direct experimental observation or direct model simulation. Building on two decades of effort in the molecular dynamics community, we have developed effective tools for making long time predictions of the behavior of a dynamical system using only a data set of short trajectories. The cumulative length of the short trajectories in the data set can be much shorter than the return time of the event itself. I will explain the basic data analysis problem and present our attempts to solve it in the context of several applications.

**14:45-15:15**

David Aristoff (Université d'État du Colorado)

*Sampling mean first passage times using weighted ensemble*

**15:30-16:00**

Coffee Break

**16:00-16:30**

Peter Bolhuis (Université d'Amsterdam)

*Learning the reaction coordinates for activated processes in complex molecular systems*

The reaction coordinate (RC) is the principal collective variable or feature that determines the progress along an activated or reactive process. In a molecular simulation using enhanced sampling, a good description of the RC is crucial for generating sufficient statistics. Moreover, the RC provides invaluable atomistic insight in the process under study.

The optimal RC is the committor, which represents the likelihood of a system to evolve towards a given state based on the coordinates of all its particles. The committor can be computed with brute force MD, or more efficiently by e.g. Transition Path Sampling. Novel schemes for transition path sampling using reinforcement learning can now effectively map the committor function. However, the interpretability of the committor, being a high dimensional function, is by definition still low. Applying dimensionality reduction can reveal the RC in terms of low-dimensional human understandable molecular collective variables (CVs) or order parameters. While several methods can perform this dimensionality reduction, such as likelihood maximization or symbolic regression, they usually require a preselection of these low-dimension CVs. Here we apply an extended auto-encoder that maps the input (many CVs) onto a lower-dimensional latent space, which is subsequently used for the reconstruction of the input as well as the prediction of the committor. As a consequence, the latent space is optimized for both reconstruction and committor prediction, and is likely to yield the best nonlinear low-dimensional representation of the committor.

I illustrate the method on simple but nontrivial toy systems, as well as extensive molecular simulation data of methane hydrate nucleation. The extended autoencoder model can effectively extract the underlying mechanism of a reaction, make reliable predictions about the committor of a given configuration, and potentially even generate new paths representative for a reaction.

**16:45-17:30**

Gabriel Stoltz (Ecole Nationale des Ponts et Chaussées, Paris)

*Removing the mini-batching error in Bayesian inference using Adaptive Langevin dynamics *

The computational cost of usual Monte Carlo methods for sampling a posteriori laws in Bayesian inference scales linearly with the number of data points. One option to reduce it to a fraction of this cost is to resort to mini-batching to estimate the gradient. However, this leads to an additional noise in the dynamics and hence a bias on the invariant measure which is sampled by the Markov chain.

We advocate using the so-called Adaptive Langevin dynamics, which is a modification of standard inertial Langevin dynamics with a dynamical friction which automatically corrects for the increased noise arising from mini-batching. We show using techniques from hypocoercivity that the law of Adaptive Langevin dynamics converges exponentially fast to equilibrium, with a rate which can be quantified in terms of the key parameters of the dynamics (mass of the extra variable and magnitude of the fluctuation in the Langevin dynamics). This allows us in particular to obtain a Central Limit Theorem on time averages along realizations of the dynamics.

We also investigate the practical relevance of the assumptions underpinning Adaptive Langevin (constant covariance for the estimation of the gradient), which are not satisfied in typical models of Bayesian inference; and show how to extend the approach to more general situations. Applications and extensions to Bayesian Neural Networks will also be discussed.

__Tuesday October 4th 2022__

**9:00-9:15**

Gabriel Peyré (CNRS/ENS)

*Introduction to the AISSAI Center*

**9:15-9:45**

Marylou Gabrié (Ecole Polytechnique, Paris)

*Enhancing sampling of physical states with adaptive MCMC powered by normalizing flows*

**10:00-10:30**

Alain Durmus (ENS Paris Saclay & École Polytechnique)

*Boost your favorite MCMC: the Kick-Kac Teleportation algorithm*

In this work, we propose to target a given probability measure π by combining two Markov kernels with different invariant probability measures. In its basic form, the mechanism consists in picking up the current position and moving it according to a π-invariant Markov kernel as soon as the proposed move does not fall into a predefined region. If this is the case, then we resort to the last position in this region and move it according to another auxiliary Markov kernel before starting another excursion outside the region with the first kernel. These state dependent interactions allow to combine smoothly different dynamics that can be taylored to each region while the resulting process still targets the probability measure π thanks to an argument based on the Kac formula.

Under weak conditions, we obtain the Law of Large numbers starting from any point of the state space, as a byproduct of the same property for the different implied kernels. Geometric ergodicity and Central Limit theorem are also established. Generalisations where the indicator function on the region target is replaced by an arbitrary acceptance probability are also given and allow to consider any Metropolis Hastings algorithm as a particular case of this general framework. Numerical examples, including mixture of Gaussian distributions are also provided and discussed.

**10:45-11:30**

Coffee Break

**11:30-12:30**

Freddy Bouchet (Ecole Normale Supérieure, Lyon)

*Colloquium* : *Probabilistic forecast of extreme heat waves using convolutional neural networks and rare event simulations*

Understanding extreme events and their probability is key for the study of climate change impacts, risk assessment, adaptation, and the protection of living beings. Extreme heatwaves are, and likely will be in the future, among the deadliest weather events. Forecasting their occurrence probability a few days, weeks, or months in advance is a primary challenge for risk assessment and attribution, but also for fundamental studies about processes, dataset and model validation, and climate change studies.

We will demonstrate that deep neural networks can predict the probability of occurrence of long lasting 14-day heatwaves over France, up to 15 days ahead of time for fast dynamical drivers (500 hPa geopotential height fields), and at much longer lead times for slow physical drivers (soil moisture). This forecast is made seamlessly in time and space, for fast hemispheric and slow local drivers.

A key scientific message is that training deep neural networks for predicting extreme heatwaves occurs in a regime of drastic lack of data. We suggest that this is likely the case for most other applications of machine learning to large scale atmosphere and climate phenomena. We discuss perspectives for dealing with this lack of data issue, for instance using rare event simulations.

Rare event simulations are a very efficient tool to oversample drastically the statistics of rare events. We will discuss the coupling of machine learning approaches, for instance the analogue method, with rare event simulations, and discuss their efficiency and their future interest for climate simulations.

**12:30-14:00**

Lunch Break

**14:00-14:30**

Stéphane Mallat (Ecole Normale Supérieure, Paris)

*Multiscale Conditional Learning and Sampling for Physics and Images*

We show that probabiilty distributions can be factorised into conditional probabilities of wavelet coefficients across scales, which are typically local and sometime nearly Gaussian, as in Wilson renormalisation group in Physics. It defines low-dimensional models of complex physical fields such as turbulences, which can be sampled without suffering from instabilities known as "criticial slowing down". This multiscale factorisation accelerates score learning in diffusion models of images, by learning simpler and well conditioned conditional probabilities as opposed to the image probability distribution.

**14:45-15:30**

Arnaud Doucet (Université d’Oxford)

*Denoising diffusion models: Generative Modeling, Inference and Monte Carlo sampling*

In this talk, we will first discuss denoising diffusion models, a novel class of generative models that provides state-of-the-art results in many domains including image and speech synthesis. We will then show how it is possible to provide a principled approach to speed up denoising diffusion models using an original approximation of the Schrodinger bridge problem. We will then show how these techniques can be easily generalized so as to perform approximate posterior simulation in high dimensional scenarios where one has only access to sample from the prior and can simulate synthetic observations from the likelihood. Finally, we will conclude by demonstrating that such ideas can also be fruitfully exploited so as to improve Annealed Importance Sampling, a popular Monte Carlo technique used to approximate normalizing constants.

**15:30-16:00**

Coffee Break

**16:00 - 16:30**

Martin Weigt (Université Pierre-et-Marie-Curie)

*Generative modeling of protein and RNA sequence ensembles*

With the sequencing revolution in biology, huge amounts of at best sparsely annotated sequence data are accumulating: while we know about 230 million distinct protein sequences, only about half a million (0.24%) have experimental annotations (source: UniProt). Unsupervised learning techniques are therefore key for understanding how biological function is encoded at the sequence level. I will discuss our efforts to construct statistical sequence models for protein and RNA families (Boltzmann machines, autoregressive models), which are generative but computationally efficient, which are flexible enough to capture the complex organization of biological sequence space, but which remain biologically interpretable. I will also discuss how these models can be applied in several challenging biological applications, from the prediction of mutational effects to the design of artificial but functional biological sequences.

**16:45-17:30**

Ryan Abbott (Massachusetts Institute of Technology)

*Normalizing flows for sampling Lattice QCD*