Adam-SHANG¶
Implements Adam-SHANG, a convergent Adam-type method derived from a symplectic Hamiltonian accelerated gradient flow.
Adam-SHANG couples the parameter iterate with an auxiliary momentum iterate and a diagonal preconditioner that accumulates squared gradients, mirroring Adam's second-moment scaling. The step size is set adaptively from the trace of the preconditioner rather than fixed, which yields provable convergence for stochastic smooth convex objectives while retaining Adam-like per-coordinate adaptivity.
where \(\theta_k\) is the parameter iterate, \(y_k\) the auxiliary momentum iterate, \(P_k=\mathrm{diag}(p_1,\dots,p_d)\succ 0\) the diagonal preconditioner, \(g_k\) the (stochastic) gradient, \(g_{k+1}^{\odot 2}\) the elementwise square, \(\alpha_k\) the adaptive step size, \(\lambda,\beta,\gamma\in(0,1]\) tuning constants, and \(\epsilon>0\) a stability term.
Reference: Yaxin Yu, Long Chen, Minfu Feng, "Adam-SHANG: A Convergent Adam-Type Method for Stochastic Smooth Convex Optimization", arXiv 2025. https://arxiv.org/abs/2605.12878