SR1 Cubic Quasi-Newton¶
Implements SR1 Cubic Quasi-Newton, a limited-memory symmetric rank-one method with adaptive cubic regularization.
A curvature estimate \(B_k\) is built from the SR1 (symmetric rank-one) update, which—unlike BFGS—admits indefinite approximations and so can capture negative curvature. Each step solves a cubic-regularized model of the loss in which the regularization replaces a trust region: the cubic term penalizes large steps and guarantees a well-defined minimizer even when \(B_k\) is indefinite. The compact eigenbasis of the low-rank \(B_k\) yields a closed-form step, and the regularization weight \(\mu_k\) is adapted from the ratio of actual to predicted decrease.
where \(g_k = \nabla f(\theta_k)\), \(B_k\) is the limited-memory SR1 Hessian approximation, \(\mu_k > 0\) is the cubic regularization weight, \(\Phi_k(s) = \lVert U_k^\top s \rVert_3\) is a shape-changing norm in the eigenbasis \(U_k\) of \(B_k\), \(\rho_k\) is the ratio of actual to predicted reduction, and \(\eta_1 \le \eta_2\), \(\gamma_1, \gamma_2 > 1\) control step acceptance and regularization adjustment.
Reference: Aditya Ranganath, Mukesh Singhal, Roummel Marcia, "Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization", arXiv 2025. https://arxiv.org/abs/2502.12298