ZO-AdaMU¶
Implements ZO-AdaMU, a zeroth-order optimizer that adapts the perturbation by simulated momentum and uncertainty.
ZO-AdaMU estimates gradients without backpropagation via simultaneous perturbation stochastic approximation (SPSA), evaluating the loss at \(\theta \pm \epsilon m_t\). Unlike vanilla zeroth-order methods that perturb with a fresh zero-centered Gaussian, the perturbation \(m_t\) is a momentum-perturbed Gaussian: a convex mix of a zero-centered draw and a draw centered at the previous momentum, with the mixing weights \(\alpha_t\), \(\beta_{1,t}\), \(\beta_{2,t}\) annealed over a three-phase schedule so the perturbation interpolates from exploratory to momentum-aligned. A second moment \(v_t\) formed from the squared perturbation components rescales the step adaptively.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(\hat{g}_t\) the SPSA gradient estimate, \(\epsilon\) the perturbation scale, \(\dot{z}\) and \(\ddot{z}\) the zero-centered and momentum-centered Gaussian components, \(m_t\) the simulated-momentum perturbation, \(v_t\) the second moment, \(\alpha_t \in [0,1]\) the annealed uncertainty, \(\beta_1,\beta_2 \in [0,1]\) the annealed smoothing weights, and \(\sigma = 10^{-8}\) a stability constant.
Reference: Shuoran Jiang, Qingcai Chen, Youcheng Pan, Yang Xiang, Yukang Lin, Xiangping Wu, Chuanyi Liu, Xiaobao Song, "ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization", AAAI 2024. https://arxiv.org/abs/2312.15184