R-AdaZO¶
Implements R-AdaZO (Refined Adaptive Zeroth-Order Optimization), an adaptive zeroth-order method that refines Adam-style moment estimates from random-perturbation gradients.
R-AdaZO estimates the gradient with a finite-difference, random-direction scheme that queries only function values, then feeds that estimate into an Adam-like update. Its refinement is to drive the second-moment accumulator with the squared first moment \(m_t^2\) rather than the squared raw estimate \(g_t^2\). Because the momentum buffer \(m_t\) has lower variance than the noisy single-step estimate, this yields a more reliable adaptive preconditioner and sharper coordinate-wise scaling.
where \(\theta\) are the parameters, \(\eta\) is the learning rate, \(g_t\) is the zeroth-order gradient estimate built from \(K\) directions \(u_k\) drawn uniformly from the unit sphere with smoothing radius \(\mu>0\), \(d\) is the parameter dimension, \(\xi_t\) is the sampled mini-batch, \(m_t\) and \(v_t\) are the first and refined second moments, \(\beta_1,\beta_2\) are the decay rates, and \(\zeta\) is a small constant for numerical stability.
Reference: Yao Shu, Qixin Zhang, Kun He, Zhongxiang Dai, "Refining Adaptive Zeroth-Order Optimization at Ease", ICML 2025. https://arxiv.org/abs/2502.01014