DP-SGD-RC¶
Implements DP-SGD-RC, differentially private SGD with randomized per-sample clipping.
Standard DP-SGD clips every per-sample gradient to a fixed norm \(C\) and adds Gaussian noise, but computing exact per-sample norms is the memory bottleneck when fine-tuning large language models on long contexts. DP-SGD-RC ("Randomized Clipping") replaces the exact per-sample norm with a cheap stochastic estimate \(\hat{n}_i\) obtained from trace estimation (Hutchinson / Hutch++), so the clipping factor becomes random while the rest of the DP-SGD pipeline—noise calibrated to the clip threshold \(C\) and a standard optimizer step—is unchanged. Because the squared norm is estimated, the clip factor uses \(C/\sqrt{\hat{n}_i}\).
where \(g_i\) is the per-sample gradient, \(\hat{n}_i\) is the randomized estimate of \(\|g_i\|^2\), \(C\) is the clipping threshold, \(L\) is the lot size, \(\sigma\) is the noise multiplier, \(\mathcal{N}(0,I)\) is Gaussian noise, \(\eta\) is the learning rate, and \(\theta\) the parameters (the descent step may be replaced by Adam).
Reference: Enayat Ullah, Sai Aparna Aketi, Devansh Gupta, Huanyu Zhang, Meisam Razaviyayn, "Efficient DP-SGD for LLMs with Randomized Clipping", arXiv 2025. https://arxiv.org/abs/2605.24879