LOSCAR-SGD¶
Implements LOSCAR-SGD, local SGD that overlaps communication with computation and corrects the staleness through sparse model averaging.
Each round, every worker runs local SGD from its current iterate. It takes \(N_i\) steps to produce a pre-communication model \(y_i^r\), sparsifies it to \(K\) coordinates, and ships it to the server while continuing to compute \(Q_i\) overlap steps to reach \(z_i^r\). The server averages the sparse messages into \(\bar m^r\). Because each worker kept iterating past the snapshot it sent, naive overwriting would discard that overlap progress; instead the merge keeps the local iterate \(z_i^r\) and adds, on the synchronized coordinates only, the correction \(\bar m_j^r - y_{i,j}^r\), replacing each worker's stale contribution with the global average without throwing away its overlap work.
where \(\theta\) is the model split across \(n\) workers as iterates \(w_{i,t}^r\), \(\eta\) is the stepsize, \(g_i\) the stochastic gradient with sample \(\xi_{i,t}^r\), \(H_i = N_i + Q_i\) the total local steps with \(Q_i\) overlap steps, \(S_r \subseteq [d]\) a uniformly random mask of \(K\) coordinates shared by all workers in round \(r\), \(\mathrm{Proj}_{S_r}\) the projection that keeps those coordinates and zeros the rest, and \(\bar m^r\) the server's sparse average over the \(n\) workers.
Reference: Yassine Maziane, Ammar Mahran, Artavazd Maranjyan, Peter Richtárik, "LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging", arXiv 2025. https://arxiv.org/abs/2605.20866