S-Adam¶
Implements S-Adam, an Adam variant that brakes the step size near non-smooth singularities.
S-Adam augments Adam with a randomized geometric probe of the local landscape. At each step it samples \(k\) unit directions and estimates a Local Geometric Instability (LGI) score \(\rho_t\) from the variance of directional finite differences, an empirical proxy for the diameter of the Clarke subdifferential. The standard Adam direction is then scaled by a multiplicative brake \(\exp(-\lambda\rho_t)\), which shrinks the effective learning rate in unstable, high-curvature-variance regions while leaving smooth basins essentially untouched.
where \(\theta\) are the parameters, \(\eta\) the base learning rate, \(g_t\) the gradient, \(m_t\) and \(v_t\) the first and second moment estimates with decays \(\beta_1,\beta_2\), \(u_i\) random unit probe directions on the sphere \(\mathbb{S}^{d-1}\), \(\delta\) the probe scale, \(\rho_t\) the LGI instability score, \(\lambda\) the damping coefficient, and \(\epsilon\) the stability constant.
Reference: Ruoran Xu, Borong She, Xiaobo Jin, Qiufeng Wang, "Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization", ICML 2026. https://arxiv.org/abs/2605.29547