ZO-AdaMM¶
Implements ZO-AdaMM, a zeroth-order (gradient-free) variant of AMSGrad for black-box optimization.
ZO-AdaMM replaces the true gradient with a one-point random-direction estimate computed from function-value queries alone, then feeds it into an AMSGrad-style adaptive-momentum update. The key technical point is that the parameter step is followed by a Mahalanobis-distance projection onto the feasible set \(\mathcal{X}\) using the adaptive matrix \(\sqrt{\hat{V}_t}\); this matched projection is what makes the method provably converge, unlike a naive Euclidean projection.
where \(\hat{g}_t\) is the zeroth-order gradient estimate, \(u_t\) a random vector drawn uniformly from the unit sphere, \(d\) the problem dimension, \(\mu\) the smoothing parameter, \(\alpha_t\) the step size, \(\beta_{1,t},\beta_2 \in (0,1]\) the momentum decay rates, \(\hat{v}_t\) the running elementwise maximum of the second moment, and \(\Pi_{\mathcal{X},H}(a) = \arg\min_{\theta \in \mathcal{X}} \|H(\theta - a)\|_2^2\) the Mahalanobis projection onto \(\mathcal{X}\).
Reference: Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, David Cox, "ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization", NeurIPS 2019. https://arxiv.org/abs/1910.06513