DP-SGD-JL¶
Implements DP-SGD-JL, differentially private SGD that approximates per-example gradient norms with Johnson-Lindenstrauss projections.
Standard DP-SGD must clip each example's gradient to bound its sensitivity, which requires materializing per-example gradients and is slow and memory hungry. DP-SGD-JL avoids this by estimating each per-example gradient norm from a handful of random Jacobian-vector products. Drawing \(r\) Gaussian probe vectors \(v_1,\dots,v_r\), the projection \(P_{ij}=\langle\nabla_\theta\mathcal{L}(\theta;X_i),v_j\rangle\) gives an unbiased norm estimate \(M_i\), so the clip factor \(\min\{1,C/M_i\}\) can be applied as a scalar weight on each example's loss before a single batch backward pass. Gaussian noise is then added to the clipped, averaged gradient.
where \(\theta\) are the parameters, \(\eta_t\) the learning rate, \(\mathcal{L}(\theta;X_i)\) the loss on example \(X_i\), \(r\) the number of JL probe vectors, \(M_i\) the estimated per-example gradient norm, \(C\) the clipping threshold, \(B\) the batch size, \(\sigma\) the noise multiplier, and \(d\) the parameter dimension.
Reference: Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Judy Hanwen Shen, Uthaipon Tantipongpipat, "Fast and Memory Efficient Differentially Private-SGD via JL Projections", NeurIPS 2021. https://arxiv.org/abs/2102.03013