A(DP)²SGD¶
Implements A(DP)²SGD, asynchronous decentralized parallel SGD with differential privacy.
A(DP)²SGD combines the AD-PSGD communication scheme with the Gaussian mechanism. Each of the \(K\) workers keeps a local model and computes a mini-batch gradient, then perturbs it with Gaussian noise to provide differential privacy. Instead of a central server, models are mixed across neighbors through a doubly stochastic matrix (gossip averaging), after which each worker takes a local descent step with its noised gradient. The noise variance is set from the privacy budget, the Lipschitz bound, and the iteration count rather than from per-example clipping, so the published update has no explicit clip operation.
where \(w_k^t\) is worker \(k\)'s model, \(\hat{w}_k^t\) the (possibly stale) model read for the gradient, \(B\) the batch size, \(\eta\) the learning rate, \(A_t\) a doubly stochastic mixing matrix over the \(K\) workers, and \(\sigma^2\) the Gaussian noise variance chosen from the privacy budget \((\epsilon, \delta)\).
Reference: Jie Xu, Wei Zhang, Fei Wang, "A(DP)²SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent with Differential Privacy", arXiv 2020. https://arxiv.org/abs/2008.09246