SPARTA¶
Implements SPARTA, an end-to-end differentially private sparse fine-tuning framework that privately selects a trainable subnetwork and then runs sparse DP-SGD on it.
SPARTA (Sparse & PrivAte Row-gradient Thresholding Algorithm) splits a private fine-tuning budget into two phases. In a mask-selection phase it estimates which weights matter using clipped, noised per-example gradient magnitudes accumulated over an epoch, groups them by row, and keeps the top-\(k\) rows. In a fine-tuning phase it runs standard DP-SGD but applies updates only to the selected coordinates through the binary mask \(\hat{m}\), so the dense gradient noise is confined to the chosen subnetwork.
Per example \(i\) in lot \(B_t\), mask selection scores each coordinate by its clipped absolute gradient, sums these over the lot with Gaussian noise, pools the noisy scores into per-row group totals \(\tilde{v}_j\), averages over the epoch, and selects the top-\(k\) rows; fine-tuning then performs masked DP-SGD with per-example clipping and added noise:
where \(\theta\) are the parameters, \(\eta_t\) the learning rate, \(g^{t,i}\) the per-example gradient, \(C\) the clipping norm, \(\sigma\) the noise multiplier, \(\mathcal{G}_j\) the index set of group (row) \(j\), \(\hat{m}\in\{0,1\}^d\) the selected mask, \(q\) the lot sampling count and \(n\) the dataset size for the DP-SGD average, \(T_b\) the number of selection batches, \(\odot\) the Hadamard product, and \(\mathcal{N}(0,\sigma^2 C^2 I)\) the Gaussian privacy noise.
Reference: Mehdi Makni, Kayhan Behdin, Gabriel Afriat, Zheng Xu, Sergei Vassilvitskii, Natalia Ponomareva, Hussein Hazimeh, Rahul Mazumder, "SPARTA: An Optimization Framework for Differentially Private Sparse Fine-Tuning", arXiv 2025. https://arxiv.org/abs/2503.12822