With Gabor Csanyi and James Kermode I have been developing bespoke algorithms for molecular geometry optimisation. The three main thrusts of this research are (1) preconditioners [75, 74, 59]; (2) effective line-search heuristics [75, 55]; and (3) saddle point search [75, 64, 55].

Consider a smooth potential energy landscape $E : \mathbb{R}^N \to \mathbb{R}$. Computing (local) minima is an extremely well-studied and well-understood problem. For example, the steepest descent method

$x_{n+1} = x_n - \alpha_n \nabla E(x_n)$with suitable step-size selection is guaranteed to converge at least to a critical point, which will "likely" be a local minimum. More sophisticated methods that use curvature information can even theoretically prevent convergence critical points that are not minima. The key point that makes this possible is that a reduction in energy $E$ in an optimisation step is already sufficient to guarantee convergence.

A simple saddle point search method related to steepest descent are the dimer/GAD methods. The idea is to augment the state $x_n$ with a direction variable $v_n$ and derive a dynamical system whose stable equilibria are precisely the index-1 saddle points

$\begin{aligned} \dot{x} & = - (I - 2 v \otimes v) \nabla E(x) \\ \dot{v} & = - (I - v \otimes v) \nabla^2 E(x) v \end{aligned}$The idea is that $v$ follows a gradient flow for the Rayleigh quotient and hence convergence to the minimal eigenvalue, while $x$ follows a gradient flow in all directions *orthogonal* to $v$ but moves *uphill* in the $v$-direction. To implement this idea we discretise in time,

It is easy to show that if $\alpha_n$ is sufficiently small and $(x_0, v_0)$ start *near* a stable index-1 saddle, then the iterates $(x_n, v_n)$ converge to a saddle point.

**Open Problem:** Global convergence, on the other hand is entirely open: the challenge stems from the fact that there is no obvious "merit function" (e.g., Lyapunov functional) whose minima are the index-1 saddle points. In fact, Antoine Levitt and I showed in [64] that there exists no merit function for which the dimer/GAD direction is a descent direction.

Most of my work on this topic has been on practical aspects of saddle point search, i.e., developing heuristics that perform well in practise, e.g.,

Modulated Newton methods (in preparation)

But the most interesting and most challenging question is whether it is possible to develop algorithms that are guaranteed to compute a saddle point. The most promising approach to achieve this is via the path-based methods, i.e., computing a path that connects two energy minima, such as the String and NEB methods. But these methods come (a) at a significantly increased computational cost; and (b) come with other significant analytical challenges. For "walker" methods such as dimer/GAD it is even unclear on a heuristic level whether globally convergent variants exist.

My work on preconditioners focuses on molecular potential energy landscapes and is primarily application-driven. The main idea can be explained as follows: The asymptotic convergence rate of the steepest descent method,

$x_{n+1} = x_n - \alpha_n \nabla E(x_n)$is roughly $(1 - 2/\kappa)^n$ where $\kappa = \|H\|\,\|H^{-1}\|$ is the condition number of $H := \nabla^2 E(x_*)$ and

$x_* = \lim_n x_n$. Ill-conditioning ($\kappa \gg 1$) enters molecular PESs in a variety of ways, such as large materials systems where the ratio of highest and lowest wave-numbers becomes large, or large ratios between bond-stiffness in bio-molecules.

In the preconditioned steepest descent method, the iterates are

$x_{n+1} = x_n - \alpha_n P_n^{-1} \nabla E(x_n)$and the convergence rate is now determined by the preconditioned hessian $P^{-1 / 2} H P^{-1 / 2}$ instead of $H$. Of course, one can (and should) apply preconditioning ideas to most optimisation algorithms, not only steepest descent, in particular we typically use LBFGS or CG variants in practise.

To construct a preconditioner, we begin with a surrogate PES $\tilde{E}$, possibly $E = \tilde{E}$, but in important cases e.g. when $E$ is given by an electronic structure model this is no longer possible. We then modify the hessian expression $\nabla^2 \tilde{E}(x)$ in a way that yields a new matrix $P(x) \approx \nabla^2 \tilde{E}(x)$ but which remains positive definite throughout the entire PES. The idea is best illustrated at the example of a pair potential:

Let

$\begin{aligned} & R_{ij} = x_i - x_j, \quad r_{ij} = |R_{ij}|, \quad \hat{R}_{ij} = R_{ij} / r_{ij} \\ & \text{and} \quad \tilde{E}(x) = \sum_{i \neq j} \phi(r_{ij}), \end{aligned}$then

$\begin{aligned} \langle H(x) u, u \rangle &= \sum_{i \neq j} u_{ij}^T h_{ij} u_{ij} \quad \text{where} \\ % h_{ij} &= \hat{R}_{ij} \otimes \hat{R}_{ij} \phi''(r_{ij}) + (I - \hat{R}_{ij} \otimes \hat{R}_{ij}) \frac{\phi'(r_{ij})}{r_{ij}}. \end{aligned}$We then specify the preconditioner my making positive each "local" hessian $h_{ij}$ rather than the global hessian $H$, that is, we define

$\begin{aligned} \langle P(x) u, u \rangle &= \sum_{i \neq j} u_{ij}^T p_{ij} u_{ij} \quad \text{where} \\ % p_{ij} &= \hat{R}_{ij} \otimes \hat{R}_{ij} |\phi''(r_{ij})| + (I - \hat{R}_{ij} \otimes \hat{R}_{ij}) \left|\frac{\phi'(r_{ij})}{r_{ij}}\right|. \end{aligned}$This guarantees that $P(x)$ will be positive, hence we can set $P_n = P(x_n)$ in the preconditioned steepest descent method.

This construction can be readily generalised to a wide range of bond types and optimisation schemes; see [75, 74, 59] for the details and prototype applications.