SPARQ-SGD¶
Implements SPARQ-SGD, a decentralized SGD with event-triggered, compressed (sparsified and quantized) communication.
Each node runs plain local SGD between communication rounds. At designated synchronization indices it checks an event trigger: it communicates with neighbors only when its model has drifted far enough from the last broadcast estimate. When triggered, the node sends a compressed difference \(\mathcal{C}(\cdot)\) relative to neighbors' stored estimates, the estimates are refreshed incrementally, and a gossip consensus step mixes the iterates over the doubly stochastic network matrix. This couples local steps, lazy communication, and lossy compression while matching the convergence rate of vanilla decentralized SGD.
where \(x_i\) is node \(i\)'s model, \(\hat{x}_j\) the locally stored estimate of node \(j\)'s model, \(\eta_t\) the learning rate, \(\gamma\) the consensus step size, \(\mathcal{C}\) a compression operator with \(\mathbb{E}\|x - \mathcal{C}(x)\|_2^2 \le (1-\omega)\|x\|_2^2\), \(\mathcal{N}_i\) the neighbors of \(i\), \(W=(w_{ij})\) the symmetric doubly stochastic mixing matrix, \(\mathcal{I}_t\) the set of indices where the trigger is checked, and \(c_t \le c_0 t^{1-\varepsilon}\) the increasing threshold sequence.
Reference: Navjot Singh, Deepesh Data, Jemin George, Suhas Diggavi, "SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization", arXiv preprint 2019. https://arxiv.org/abs/1910.14280