DP-AdamBC¶
Implements DP-AdamBC, differentially private Adam with bias correction of the noised second moment.
Under differential privacy the gradient is privatized by per-example clipping followed by Gaussian noise. That noise inflates the second-moment estimate \(v_t\): it accumulates an extra variance term that does not vanish, so the denominator \(\sqrt{v_t}\) is dominated by noise and the update collapses toward plain DP-SGD. DP-AdamBC subtracts this known noise variance \(\Phi = (\sigma C / B)^2\) from \(\hat v_t\) before taking the square root, restoring Adam's adaptive behavior, with a floor \(\gamma'\) for numerical stability.
where \(\theta\) are parameters, \(\eta\) the learning rate, \(g_{t,i}\) the per-example gradients, \(C\) the clipping norm, \(\sigma\) the noise multiplier, \(B\) the batch size, \(\beta_1,\beta_2\) the moment decay rates, \(\Phi=(\sigma C/B)^2\) the subtracted DP-noise variance, and \(\gamma'\) a small stability floor inside the max.
Reference: Qiaoyue Tang, Frederick Shpilevskiy, Mathias Lécuyer, "DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction)", AAAI 2024. https://arxiv.org/abs/2312.14334