Christoph Ortner

With Gabor Csanyi and James Kermode I have been developing bespoke algorithms for molecular geometry optimisation. The three main thrusts of this research are (1) preconditioners [75, 74, 59]; (2) effective line-search heuristics [75, 55]; and (3) saddle point search [75, 64, 55].

Consider a smooth potential energy landscape E:RNRE : \mathbb{R}^N \to \mathbb{R}. Computing (local) minima is an extremely well-studied and well-understood problem. For example, the steepest descent method

xn+1=xnαnE(xn) x_{n+1} = x_n - \alpha_n \nabla E(x_n)

with suitable step-size selection is guaranteed to converge at least to a critical point, which will "likely" be a local minimum. More sophisticated methods that use curvature information can even theoretically prevent convergence critical points that are not minima. The key point that makes this possible is that a reduction in energy EE in an optimisation step is already sufficient to guarantee convergence.

A simple saddle point search method related to steepest descent are the dimer/GAD methods. The idea is to augment the state xnx_n with a direction variable vnv_n and derive a dynamical system whose stable equilibria are precisely the index-1 saddle points

x˙=(I2vv)E(x)v˙=(Ivv)2E(x)v\begin{aligned} \dot{x} & = - (I - 2 v \otimes v) \nabla E(x) \\ \dot{v} & = - (I - v \otimes v) \nabla^2 E(x) v \end{aligned}

The idea is that vv follows a gradient flow for the Rayleigh quotient and hence convergence to the minimal eigenvalue, while xx follows a gradient flow in all directions orthogonal to vv but moves uphill in the vv-direction. To implement this idea we discretise in time,

xn+1=xnαn(I2vnvn)E(xn)v~n+1=vn(Ivnvn)2E(xn)vnvn+1=v~n+1/v~n+1\begin{aligned} x_{n+1} & = x_n - \alpha_n (I - 2 v_n \otimes v_n) \nabla E(x_n) \\ \tilde{v}_{n+1} & = v_n - (I - v_n \otimes v_n) \nabla^2 E(x_n) v_n \\ v_{n+1} & = \tilde{v}_{n+1} / \| \tilde{v}_{n+1} \| \end{aligned}

It is easy to show that if αn\alpha_n is sufficiently small and (x0,v0)(x_0, v_0) start near a stable index-1 saddle, then the iterates (xn,vn)(x_n, v_n) converge to a saddle point.

Open Problem: Global convergence, on the other hand is entirely open: the challenge stems from the fact that there is no obvious "merit function" (e.g., Lyapunov functional) whose minima are the index-1 saddle points. In fact, Antoine Levitt and I showed in [64] that there exists no merit function for which the dimer/GAD direction is a descent direction.

Most of my work on this topic has been on practical aspects of saddle point search, i.e., developing heuristics that perform well in practise, e.g.,

But the most interesting and most challenging question is whether it is possible to develop algorithms that are guaranteed to compute a saddle point. The most promising approach to achieve this is via the path-based methods, i.e., computing a path that connects two energy minima, such as the String and NEB methods. But these methods come (a) at a significantly increased computational cost; and (b) come with other significant analytical challenges. For "walker" methods such as dimer/GAD it is even unclear on a heuristic level whether globally convergent variants exist.

Preconditioners for Molecular Simulation

My work on preconditioners focuses on molecular potential energy landscapes and is primarily application-driven. The main idea can be explained as follows: The asymptotic convergence rate of the steepest descent method,

xn+1=xnαnE(xn) x_{n+1} = x_n - \alpha_n \nabla E(x_n)

is roughly (12/κ)n(1 - 2/\kappa)^n where κ=H H1\kappa = \|H\|\,\|H^{-1}\| is the condition number of H:=2E(x)H := \nabla^2 E(x_*) and

x=limnxn x_* = \lim_n x_n . Ill-conditioning (κ1\kappa \gg 1) enters molecular PESs in a variety of ways, such as large materials systems where the ratio of highest and lowest wave-numbers becomes large, or large ratios between bond-stiffness in bio-molecules.

In the preconditioned steepest descent method, the iterates are

xn+1=xnαnPn1E(xn) x_{n+1} = x_n - \alpha_n P_n^{-1} \nabla E(x_n)

and the convergence rate is now determined by the preconditioned hessian P1/2HP1/2P^{-1 / 2} H P^{-1 / 2} instead of HH. Of course, one can (and should) apply preconditioning ideas to most optimisation algorithms, not only steepest descent, in particular we typically use LBFGS or CG variants in practise.

To construct a preconditioner, we begin with a surrogate PES E~\tilde{E}, possibly E=E~E = \tilde{E}, but in important cases e.g. when EE is given by an electronic structure model this is no longer possible. We then modify the hessian expression 2E~(x)\nabla^2 \tilde{E}(x) in a way that yields a new matrix P(x)2E~(x)P(x) \approx \nabla^2 \tilde{E}(x) but which remains positive definite throughout the entire PES. The idea is best illustrated at the example of a pair potential:


Rij=xixj,rij=Rij,R^ij=Rij/rijandE~(x)=ijϕ(rij), \begin{aligned} & R_{ij} = x_i - x_j, \quad r_{ij} = |R_{ij}|, \quad \hat{R}_{ij} = R_{ij} / r_{ij} \\ & \text{and} \quad \tilde{E}(x) = \sum_{i \neq j} \phi(r_{ij}), \end{aligned}


H(x)u,u=ijuijThijuijwherehij=R^ijR^ijϕ(rij)+(IR^ijR^ij)ϕ(rij)rij. \begin{aligned} \langle H(x) u, u \rangle &= \sum_{i \neq j} u_{ij}^T h_{ij} u_{ij} \quad \text{where} \\ % h_{ij} &= \hat{R}_{ij} \otimes \hat{R}_{ij} \phi''(r_{ij}) + (I - \hat{R}_{ij} \otimes \hat{R}_{ij}) \frac{\phi'(r_{ij})}{r_{ij}}. \end{aligned}

We then specify the preconditioner my making positive each "local" hessian hijh_{ij} rather than the global hessian HH, that is, we define

P(x)u,u=ijuijTpijuijwherepij=R^ijR^ijϕ(rij)+(IR^ijR^ij)ϕ(rij)rij. \begin{aligned} \langle P(x) u, u \rangle &= \sum_{i \neq j} u_{ij}^T p_{ij} u_{ij} \quad \text{where} \\ % p_{ij} &= \hat{R}_{ij} \otimes \hat{R}_{ij} |\phi''(r_{ij})| + (I - \hat{R}_{ij} \otimes \hat{R}_{ij}) \left|\frac{\phi'(r_{ij})}{r_{ij}}\right|. \end{aligned}

This guarantees that P(x)P(x) will be positive, hence we can set Pn=P(xn)P_n = P(x_n) in the preconditioned steepest descent method.

This construction can be readily generalised to a wide range of bond types and optimisation schemes; see [75, 74, 59] for the details and prototype applications.