Hessian-aware Scaling¶
Implements Hessian-aware Scaling, a curvature-aware rescaling of gradient descent that picks the step magnitude from the curvature along the gradient.
Rather than computing a full Newton step, the method scales the negative gradient by a scalar \(s_t\) derived from the Hessian-vector product \(H_t g_t\) along the current gradient direction. When the curvature \(\langle g_t, H_t g_t\rangle\) is strongly positive, the scaling matches a one-dimensional second-order step; an Armijo line search then sets the step size \(\alpha_t\), which is provably \(1\) near a minimizer. The CG, MR, and GM variants below differ only in how \(s_t\) is formed from the same curvature quantities.
where \(g_t = \nabla f(\theta_t)\), \(H_t\) is the Hessian at \(\theta_t\), \(s_t > 0\) is the Hessian-aware scaling (chosen as above when \(\langle g_t, H_t g_t\rangle > \sigma\lVert g_t\rVert^2\) for a tolerance \(\sigma \ll 1\), and from prescribed ranges under limited or negative curvature), and \(\alpha_t > 0\) is the Armijo backtracking step size accepting \(f(\theta_t - \alpha s_t g_t) \le f(\theta_t) - \rho\,\alpha\, s_t \lVert g_t\rVert^2\) with \(\rho \in (0, \tfrac{1}{2})\), initialized at \(\alpha = 1\).
Reference: Oscar Smee, Fred Roosta, Stephen J. Wright, "First-ish Order Methods: Hessian-aware Scalings of Gradient Descent", arXiv 2025. https://arxiv.org/abs/2502.03701