DEO¶
Implements DEO (Dimer-Enhanced Optimization), a first-order method that escapes saddle points by removing low-curvature gradient components.
DEO adapts the Dimer method from molecular simulation to estimate, with only gradient evaluations, the local lowest-curvature direction \(\hat{N}_t\). A second point \(\theta_2 = \theta_t + \Delta R\,\hat{N}_t\) is used to compute a rotational force that aligns \(\hat{N}_t\) with the smallest-eigenvalue eigenvector of the Hessian, all without forming the Hessian explicitly. The raw gradient is then projected to subtract its component along \(\hat{N}_t\), biasing the step away from flat or negatively curved directions, and the corrected gradient is fed into a standard Adam update.
where \(g_t\) is the gradient at \(\theta_t\), \(g_2\) is the gradient at the dimer point \(\theta_2 = \theta_t + \Delta R\,\hat{N}_t\), \(\hat{N}_t\) is the unit dimer direction estimating the lowest-curvature eigenvector, \(\Delta R\) is the dimer separation, \(\eta_{\mathrm{rot}}\) is the rotation step size, \(\alpha\) is the correction coefficient scaling the projection removal, \(\eta\) is the learning rate, \(\beta_1,\beta_2\) are the moment decay rates, and \(\epsilon\) is a stability constant.
Reference: Yue Hu, Zanxia Cao, Yingchao Liu, "Dimer-Enhanced Optimization: A First-Order Approach to Escaping Saddle Points in Neural Network Training", arXiv 2025. https://arxiv.org/abs/2507.19968