Wuchen Li

Assistant professor in Mathematics at University of South Carolina.

Google Scholar

Colloquia and RTG data science seminars.

Optimal transport and Mean field games workshop series



Recent News

Draft "A Natural Primal-Dual Hybrid Gradient Method for Adversarial Neural Network Training on Solving Partial Differential Equations" is online. We propose a scalable preconditioned primal-dual hybrid gradient algorithm for solving partial differential equations (PDEs). We multiply the PDE with a dual test function to obtain an inf-sup problem whose loss functional involves lower-order differential operators. The Primal-Dual Hybrid Gradient (PDHG) algorithm is then leveraged for this saddle point problem. By introducing suitable precondition operators to the proximal steps in the PDHG algorithm, we obtain an alternative natural gradient ascent-descent optimization scheme for updating the neural network parameters. We apply the Krylov subspace method (MINRES) to evaluate the natural gradients efficiently. Such treatment readily handles the inversion of precondition matrices via matrix-vector multiplication. A posterior convergence analysis is established for the time-continuous version of the proposed method. The algorithm is tested on various types of PDEs with dimensions ranging from $1$ to $50$, including linear and nonlinear elliptic equations, reaction-diffusion equations, and Monge-Amp\`ere equations stemming from the $L^2$ optimal transport problems. We compare the performance of the proposed method with several commonly used deep learning algorithms such as physics-informed neural networks (PINNs), the DeepRitz method, weak adversarial networks (WANs), etc, for solving PDEs using the Adam and L-BFGS optimizers. The numerical results suggest that the proposed method performs efficiently and robustly and converges more stable. Nov 12, 2024.

Draft "Gradient-adjusted underdamped Langevin dynamics for sampling" is online. Sampling from a target distribution is a fundamental problem with wide-ranging applications in scientific computing and machine learning. Traditional Markov chain Monte Carlo (MCMC) algorithms, such as the unadjusted Langevin algorithm (ULA), derived from the overdamped Langevin dynamics, have been extensively studied. From an optimization perspective, the Kolmogorov forward equation of the overdamped Langevin dynamics can be treated as the gradient flow of the relative entropy in the space of probability densities embedded with Wasserstein-2 metrics. Several efforts have also been devoted to including momentum-based methods, such as underdamped Langevin dynamics for faster convergence of sampling algorithms. Recent advances in optimizations have demonstrated the effectiveness of primal-dual damping and Hessian-driven damping dynamics for achieving faster convergence in solving optimization problems. Motivated by these developments, we introduce a class of stochastic differential equations (SDEs) called gradient-adjusted underdamped Langevin dynamics (GAUL), which add stochastic perturbations in primal-dual damping dynamics and Hessian-driven damping dynamics from optimization. We prove that GAUL admits the correct stationary distribution, whose marginal is the target distribution. The proposed method outperforms overdamped and underdamped Langevin dynamics regarding convergence speed in the total variation distance for Gaussian target distributions. Moreover, using the Euler-Maruyama discretization, we show that the mixing time towards a biased target distribution only depends on the square root of the condition number of the target covariance matrix. Numerical experiments for non-Gaussian target distributions, such as Bayesian regression problems and Bayesian neural networks, further illustrate the advantages of our approach over classical methods based on overdamped or underdamped Langevin dynamics. Oct 14, 2024.

Draft "Score-based Neural Ordinary Differential Equations for Computing Mean Field Control Problems" is online. Classical neural ordinary differential equations (ODEs) are powerful tools for ap- proximating the log-density functions in high-dimensional spaces along trajecto- ries, where neural networks parameterize the velocity fields. This paper proposes a system of neural differential equations representing first- and second-order score functions along trajectories based on deep neural networks. We reformulate the mean field control (MFC) problem with individual noises into an unconstrained optimization problem framed by the proposed neural ODE system. Addition- ally, we introduce a novel regularization term to enforce characteristics of viscous Hamilton–Jacobi–Bellman (HJB) equations to be satisfied based on the evolution of the second-order score function. Examples include regularized Wasserstein proximal operators (RWPOs), probability flow matching of Fokker–Planck (FP) equations, and linear quadratic (LQ) MFC problems, which demonstrate the ef- fectiveness and accuracy of the proposed method. Sep 26, 2024.

Draft "Convergence of Noise-Free Sampling Algorithms with Regularized Wasserstein Proximals" is online. We investigate the convergence properties of the backward regularized Wasserstein proximal (BRWP) method for sampling a target distribution. The BRWP approach can be shown as a semi-implicit time discretization for a probability flow ODE with the score function whose density satisfies the Fokker-Planck equation of the overdamped Langevin dynamics. Specifically, the evolution of the score function is computed using a kernel formula derived from the regularized Wasserstein proximal operator. By applying the Laplace method to obtain the asymptotic expansion of this kernel formula, we establish guaranteed convergence in terms of the Kullback-Leibler divergence for the BRWP method towards a strongly log-concave target distribution. Our analysis also identifies the optimal and maximum step sizes for convergence. Furthermore, we demonstrate that the deterministic and semi-implicit BRWP scheme outperforms many classical Langevin Monte Carlo methods, such as the Unadjusted Langevin Algorithm (ULA), by offering faster convergence and reduced bias. Numerical experiments further validate the convergence analysis of the BRWP method. Sep 3, 2024.

Draft "Fluctuations in Wasserstein dynamics on Graphs" is online. We propose a drift-diffusion process on the probability simplex to study stochastic fluctuations in probability spaces. We construct a counting process for linear detailed balanced chemical reactions with finite species such that its thermodynamic limit is a system of ordinary differential equations (ODE) on the probability simplex. This ODE can be formulated as a gradient flow with an Onsager response matrix that induces a Riemannian metric on the probability simplex. After incorporating the induced Riemannian structure, we propose a diffusion approximation of the rescaled counting process for molecular species in the chemical reactions, which leads to Langevin dynamics on the probability simplex with a degenerate Brownian motion constructed from the eigen-decomposition of Onsager's response matrix. The corresponding Fokker-Planck equation on the simplex can be regarded as the usual drift-diffusion equation with the extrinsic representation of the divergence and Laplace-Beltrami operator. The invariant measure is the Gibbs measure, which is the product of the original macroscopic free energy and a volume element. When the drift term vanishes, the Fokker-Planck equation reduces to the heat equation with the Laplace-Beltrami operator, which we refer to as canonical Wasserstein diffusions on graphs. In the case of a two-point probability simplex, the constructed diffusion process is converted to one dimensional Wright-Fisher diffusion process, which leads to a natural boundary condition ensuring that the process remains within the probability simplex. Aug 16, 2024.

Draft "Efficient Computation of Mean field Control based Barycenters from Reaction-Diffusion Systems" is online. We develop a class of barycenter problems based on mean field control problems in three dimensions with associated reactive-diffusion systems of unnormalized multi-species densities. This problem is the generalization of the Wasserstein barycenter problem for single probability density functions. The primary objective is to present a comprehensive framework for efficiently computing the proposed variational problem: generalized Benamou-Brenier formulas with multiple input density vectors as boundary conditions. Our approach involves the utilization of high-order finite element discretizations of the spacetime domain to achieve improved accuracy. The discrete optimization problem is then solved using the primal-dual hybrid gradient (PDHG) algorithm, a first-order optimization method for effectively addressing a wide range of constrained optimization problems. The efficacy and robustness of our proposed framework are illustrated through several numerical examples in three dimensions, such as the computation of the barycenter of multi-density systems consisting of Gaussian distributions and reactive-diffusive multi-density systems involving 3D voxel densities. Additional examples highlighting computations on 2D embedded surfaces are also provided. April 2, 2024.

Draft "A primal-dual hybrid gradient method for solving optimal control problems and the corresponding Hamilton-Jacobi PDEs" is online. March 6, 2024.

Draft "Numerical analysis on neural network projected schemes for approximating one dimensional Wasserstein gradient flows" is online. We provide a numerical analysis and computation of neural network pro- jected schemes for approximating one dimensional Wasserstein gradient flows. We approximate the Lagrangian mapping functions of gradient flows by the class of two-layer neural network functions with ReLU (rectified linear unit) activation functions. The numerical scheme is based on a projected gradient method, namely the Wasserstein natural gradient, where the projection is constructed from the L2 mapping spaces onto the neural network parameterized mapping space. We establish theoretical guarantees for the performance of the neural projected dynamics. We derive a closed-form update for the scheme with well-posedness and explicit consistency guarantee for a particular choice of network structure. General truncation error analysis is also established on the basis of the projective nature of the dynamics. Numerical examples, including gradient drift Fokker-Planck equations, porous medium equations, and Keller-Segel models, verify the accuracy and effectiveness of the proposed neural projected algorithm. Feb 27, 2024.

Draft "Mean field control of droplet dynamics with high order finite element computations" is online. Liquid droplet dynamics are widely used in biological and engineering applications, which contain complex interfacial instabilities and pattern formulation such as droplet merging, splitting, and transport. This paper studies a class of mean field control formulation towards these droplet dynamics. They are used to control and maintain the manipulation of droplets in applications. We first formulate the droplet dynamics as gradient flows of free energies in modified optimal transport metrics with nonlinear mobilities. We then design an optimal control problem for these gradient flows. We lastly apply the primal-dual hybrid gradient algorithm with high-order finite element methods to simulate the proposed mean field control problems. Numerical examples, including droplet formation, bead-up/spreading, transport, and merging/splitting on a two-dimensional spatial domain, demonstrate the effectiveness of the proposed mean field control mechanism. Feb 10, 2024.

Draft "Wasserstein proximal operators describe score-based generative models and resolve memorization" is online. We focus on the fundamental mathematical structure of score-based generative models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO formulation reveals mathematical structure that describes the inductive bias of diffusion and score-based models. In particular, MFGs yield optimality conditions in the form of a pair of coupled partial differential equations: a forward-controlled Fokker-Planck (FP) equation, and a backward Hamilton-Jacobi-Bellman (HJB) equation. Via a Cole-Hopf transformation and taking advantage of the fact that the cross-entropy can be related to a linear functional of the density, we show that the HJB equation is an uncontrolled FP equation. Second, with the mathematical structure at hand, we present an interpretable kernel-based model for the score function which dramatically improves the performance of SGMs in terms of training samples and training time. In addition, the WPO-informed kernel model is explicitly constructed to avoid the recently studied memorization effects of score-based generative models. The mathematical form of the new kernel-based models in combination with the use of the terminal condition of the MFG reveals new explanations for the manifold learning and generalization properties of SGMs, and provides a resolution to their memorization effects. Finally, our mathematically informed, interpretable kernel-based model suggests new scalable bespoke neural network architectures for high-dimensional applications. Feb 9, 2023.

Draft "Fisher information dissipation for time inhomogeneous stochastic differential equations" is online. We provide a Lyapunov convergence analysis for time-inhomogeneous variable coefficient stochastic differential equations. Three typical examples include overdamped, irreversible drift, and underdamped Langevin dynamics. We first reformulate the probability transition equation of Langevin dynamics as a modified gradient flow of the Kullback–Leibler divergence in the probability space with respect to time-dependent optimal transport metrics. This formulation contains both gradient and non-gradient directions depending on a class of time-dependent target distributions. We then select a time-dependent relative Fisher information functional as a Lyapunov functional. We develop a time-dependent Hessian matrix condition, which guarantees the convergence of the probability density function of Langevin dynamics. We verify the proposed conditions for several time-inhomogenous Langevin dynamics. For overdamped Langevin dynamics, we prove the $O(t^{-\frac{1}{2}})$ convergence of $L^1$ distance for the simulated annealing dynamics with a strongly convex potential function. For the irreversible drift Langevin dynamics, we prove an improved convergence towards the target distribution in an asymptotic regime. We also verify the convergence condition for the underdamped dynamics. Numerical examples demonstrate the convergence results of time-dependent Langevin dynamics. Feb 5, 2024.

Draft "Numerical analysis of a first-order computational algorithm for reaction-diffusion equations via the primal-dual hybrid gradient method" is online. A first-order optimization algorithm has been introduced to solve time-implicit schemes of reaction-diffusion equations. In this research, we conduct theoretical studies on this first-order algorithm equipped with a quadratic regularization term. We provide sufficient conditions under which the proposed algorithm and its time-continuous limit converge exponentially fast to a desired time-implicit numerical solution. We show both theoretically and numerically that the convergence rate is independent of the grid size, which makes our method suitable for large scale problems. The efficiency of our algorithm has been verified via a series of numerical examples conducted on various types of reaction-diffusion equations. The choice of optimal hyperparameters as well as comparisons with some classical root-finding algorithms are also discussed in the numerical section. Jan 30, 2024.

Draft "Tensor train based sampling algorithms for approximating regularized Wasserstein proximal operators" is online. We present a tensor train (TT) based algorithm designed for sampling from a target distribution and employ TT approximation to capture the high-dimensional probability density evolution of overdamped Langevin dynamics. This involves utilizing the regularized Wasserstein proximal operator, which exhibits a simple kernel integration formulation, i.e., the softmax formula of the traditional proximal operator. The integration poses a challenge in practical scenarios, making the algorithm practically implementable only with the aid of TT approximation. In the specific context of Gaussian distributions, we rigorously establish the unbiasedness and linear convergence of our sampling algorithm towards the target distribution. To assess the effectiveness of our proposed methods, we apply them to various scenarios, including Gaussian families, Gaussian mixtures, bimodal distributions, and Bayesian inverse problems in numerical examples. The sampling algorithm exhibits superior accuracy and faster convergence when compared to classical Langevin dynamics-type sampling algorithms. Jan 25, 2024.

Draft "A deep learning algorithm for computing mean field control problems via forward-backward score dynamics" is online. We propose a deep learning approach to compute mean field control problems with individual noises. The problem consists of the Fokker-Planck (FP) equation and the Hamilton-Jacobi-Bellman (HJB) equation. Using the differential of the entropy, namely the score function, we first formulate the deterministic forward-backward characteristics for the mean field control system, which is different from the classical forward-backward stochastic differential equations (FBSDEs). We further apply the neural network approximation to fit the proposed deterministic characteristic lines. Numerical examples, including the control problem with entropy potential energy, the linear quadratic regulator, and the systemic risks, demonstrate the effectiveness of the proposed method. Jan 19, 2024.

Previous results

Research Interests

Applied mathematics; Transport information geometry in PDEs, Data science, Graphs and Neural networks, Complex Dynamical systems, Computation, Statistics, Control and Games.







Recent Publications

Our paper "A first-order computational algorithm for reaction-diffusion type equations via primal-dual hybrid gradient method" is accepted in Jan, 2024.