Christoph Ortner

With Gabor Csanyi and James Kermode I have been developing bespoke algorithms for molecular geometry optimisation. The three main thrusts of this research are (1) preconditioners [75, 74, 59]; (2) effective line-search heuristics [75, 55]; and (3) saddle point search [75, 64, 55].

Consider a smooth potential energy landscape $E : \mathbb{R}^N \to \mathbb{R}$. Computing (local) minima is an extremely well-studied and well-understood problem. For example, the steepest descent method

$x_{n+1} = x_n - \alpha_n \nabla E(x_n)$

with suitable step-size selection is guaranteed to converge at least to a critical point, which will "likely" be a local minimum. More sophisticated methods that use curvature information can even theoretically prevent convergence critical points that are not minima. The key point that makes this possible is that a reduction in energy $E$ in an optimisation step is already sufficient to guarantee convergence.

A simple saddle point search method related to steepest descent are the dimer/GAD methods. The idea is to augment the state $x_n$ with a direction variable $v_n$ and derive a dynamical system whose stable equilibria are precisely the index-1 saddle points

\begin{aligned} \dot{x} & = - (I - 2 v \otimes v) \nabla E(x) \\ \dot{v} & = - (I - v \otimes v) \nabla^2 E(x) v \end{aligned}

The idea is that $v$ follows a gradient flow for the Rayleigh quotient and hence convergence to the minimal eigenvalue, while $x$ follows a gradient flow in all directions orthogonal to $v$ but moves uphill in the $v$-direction. To implement this idea we discretise in time,

\begin{aligned} x_{n+1} & = x_n - \alpha_n (I - 2 v_n \otimes v_n) \nabla E(x_n) \\ \tilde{v}_{n+1} & = v_n - (I - v_n \otimes v_n) \nabla^2 E(x_n) v_n \\ v_{n+1} & = \tilde{v}_{n+1} / \| \tilde{v}_{n+1} \| \end{aligned}

It is easy to show that if $\alpha_n$ is sufficiently small and $(x_0, v_0)$ start near a stable index-1 saddle, then the iterates $(x_n, v_n)$ converge to a saddle point.

Open Problem: Global convergence, on the other hand is entirely open: the challenge stems from the fact that there is no obvious "merit function" (e.g., Lyapunov functional) whose minima are the index-1 saddle points. In fact, Antoine Levitt and I showed in [64] that there exists no merit function for which the dimer/GAD direction is a descent direction.

Most of my work on this topic has been on practical aspects of saddle point search, i.e., developing heuristics that perform well in practise, e.g.,

• Preconditioners [75, 74, 59]

• Line-search heuristics [75, 55]

• Modulated Newton methods (in preparation)

But the most interesting and most challenging question is whether it is possible to develop algorithms that are guaranteed to compute a saddle point. The most promising approach to achieve this is via the path-based methods, i.e., computing a path that connects two energy minima, such as the String and NEB methods. But these methods come (a) at a significantly increased computational cost; and (b) come with other significant analytical challenges. For "walker" methods such as dimer/GAD it is even unclear on a heuristic level whether globally convergent variants exist.

### Preconditioners for Molecular Simulation

My work on preconditioners focuses on molecular potential energy landscapes and is primarily application-driven. The main idea can be explained as follows: The asymptotic convergence rate of the steepest descent method,

$x_{n+1} = x_n - \alpha_n \nabla E(x_n)$

is roughly $(1 - 2/\kappa)^n$ where $\kappa = \|H\|\,\|H^{-1}\|$ is the condition number of $H := \nabla^2 E(x_*)$ and

$x_* = \lim_n x_n$. Ill-conditioning ($\kappa \gg 1$) enters molecular PESs in a variety of ways, such as large materials systems where the ratio of highest and lowest wave-numbers becomes large, or large ratios between bond-stiffness in bio-molecules.

In the preconditioned steepest descent method, the iterates are

$x_{n+1} = x_n - \alpha_n P_n^{-1} \nabla E(x_n)$

and the convergence rate is now determined by the preconditioned hessian $P^{-1 / 2} H P^{-1 / 2}$ instead of $H$. Of course, one can (and should) apply preconditioning ideas to most optimisation algorithms, not only steepest descent, in particular we typically use LBFGS or CG variants in practise.

To construct a preconditioner, we begin with a surrogate PES $\tilde{E}$, possibly $E = \tilde{E}$, but in important cases e.g. when $E$ is given by an electronic structure model this is no longer possible. We then modify the hessian expression $\nabla^2 \tilde{E}(x)$ in a way that yields a new matrix $P(x) \approx \nabla^2 \tilde{E}(x)$ but which remains positive definite throughout the entire PES. The idea is best illustrated at the example of a pair potential:

Let

\begin{aligned} & R_{ij} = x_i - x_j, \quad r_{ij} = |R_{ij}|, \quad \hat{R}_{ij} = R_{ij} / r_{ij} \\ & \text{and} \quad \tilde{E}(x) = \sum_{i \neq j} \phi(r_{ij}), \end{aligned}

then

\begin{aligned} \langle H(x) u, u \rangle &= \sum_{i \neq j} u_{ij}^T h_{ij} u_{ij} \quad \text{where} \\ % h_{ij} &= \hat{R}_{ij} \otimes \hat{R}_{ij} \phi''(r_{ij}) + (I - \hat{R}_{ij} \otimes \hat{R}_{ij}) \frac{\phi'(r_{ij})}{r_{ij}}. \end{aligned}

We then specify the preconditioner my making positive each "local" hessian $h_{ij}$ rather than the global hessian $H$, that is, we define

\begin{aligned} \langle P(x) u, u \rangle &= \sum_{i \neq j} u_{ij}^T p_{ij} u_{ij} \quad \text{where} \\ % p_{ij} &= \hat{R}_{ij} \otimes \hat{R}_{ij} |\phi''(r_{ij})| + (I - \hat{R}_{ij} \otimes \hat{R}_{ij}) \left|\frac{\phi'(r_{ij})}{r_{ij}}\right|. \end{aligned}

This guarantees that $P(x)$ will be positive, hence we can set $P_n = P(x_n)$ in the preconditioned steepest descent method.

This construction can be readily generalised to a wide range of bond types and optimisation schemes; see [75, 74, 59] for the details and prototype applications.