DP-MacAdam¶
Implements DP-MacAdam, a differentially private optimizer that reuses one set of mean and variance estimates for both adaptive per-example clipping and adaptive momentum.
Standard DP-SGD clips each per-example gradient to a fixed norm before adding Gaussian noise, which discards scale information and couples the clipping bias to a hand-tuned threshold. DP-MacAdam instead centers each per-example gradient by the bias-corrected first moment \(\hat{m}_{t-1}\) and rescales it by a per-coordinate bound \(b_{t-1}\), normalizes the result to unit norm, averages over the batch, and adds noise. The same Adam-style moment estimates that drive the momentum update are also used to refresh the clipping bound, so clipping adapts automatically to the running gradient statistics.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t^{(i)}\) the per-example gradient, \(B\) the batch size, \(\sigma\) the noise multiplier, \(\beta_1,\beta_2\) the moment decay rates, \(\gamma\) a stability constant, \(b_t\) the per-coordinate adaptive clipping bound, \(s_t/\hat{s}_t\) the (debiased, noise-corrected) variance estimate, \(\kappa_t\) the bias-correction factor for the variance EMA, and \(h_1,h_2\) lower and upper clamps; all squarings, divisions, and roots are coordinate-wise.
Reference: Naima Tasnim, Lalitha Sankar, Oliver Kosut, "DP-MacAdam: Differentially Private Mechanism with Adaptive Clipping and Adaptive Momentum", arXiv 2026. https://arxiv.org/abs/2606.05435