OptEMA¶
Implements OptEMA, an EMA-based optimizer with a closed-loop, Lipschitz-free adaptive stepsize.
OptEMA keeps the Adam-style first- and second-moment EMA backbone but replaces the preset coefficients and stepsize with trajectory-dependent quantities driven by a Corrected AdaGrad-Norm statistic \(\rho_t\) and the running maximum gradient norm \(\hat g_t\). Two symmetric variants are studied: OptEMA-M makes the first-moment coefficient adaptive (\(\alpha_t = \rho_t\)) with a fixed second-moment decay, while OptEMA-V makes the second-moment coefficient adaptive (\(\beta_t = \rho_t\)) with a fixed first-moment decay. The Corrected AdaGrad-Norm numerator averages historical squared gradient norms, which tempers AdaGrad-Norm's premature decay and yields a noise-adaptive rate that collapses to the deterministic optimum when the noise vanishes.
where \(\theta\) are the parameters, \(\eta\) the base learning rate (default \(1\)), \(g_t\) the stochastic gradient, \(m_t\)/\(v_t\) the first/second moment EMAs, \(\alpha_t,\beta_t\) the EMA coefficients, \(\gamma_t\) the closed-loop effective stepsize, \(\rho_t\) the Corrected AdaGrad-Norm statistic with \(\tau\in[0,1]\) (default \(\tau=1\)), \(\hat g_t\) the running max gradient norm, \(\alpha,\beta\in(0,1)\) the fixed decays, and \(\mu,\varepsilon\) small stabilizers.
Reference: Ganzhao Yuan, "OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality", arXiv 2026. https://arxiv.org/abs/2603.09923