| signSGD |
ICML 2018 |
signSGD: Compressed Optimisation for Non-Convex Problems |
official |
— |
| LD-SGD |
arXiv 2019 |
Communication-Efficient Local Decentralized SGD Methods |
— |
— |
| Local SGD |
ICLR 2019 |
Local SGD Converges Fast and Communicates Little |
community |
— |
| PowerSGD |
NeurIPS 2019 |
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization |
— |
— |
| Qsparse-local-SGD |
NeurIPS 2019 |
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations |
— |
— |
| signProx |
ICASSP 2019 |
signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic Optimization |
— |
— |
| APMSqueeze |
arXiv 2020 |
APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm |
— |
— |
| DEED-GD |
arXiv 2020 |
DEED: A General Quantization Scheme for Communication Efficiency in Bits |
— |
— |
| FedAC |
NeurIPS 2020 |
Federated Accelerated Stochastic Gradient Descent |
— |
— |
| LAGS-SGD |
ECAI 2020 |
Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees |
— |
— |
| rTop-k |
JSAIT 2020 |
rTop-k: A Statistical Estimation Approach to Distributed SGD |
— |
— |
| SCAFFOLD |
ICML 2020 |
SCAFFOLD: Stochastic Controlled Averaging for Federated Learning |
— |
— |
| SlowMo |
ICLR 2020 |
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum |
— |
— |
| ZeRO |
SC 2020 |
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models |
official |
— |
| 1-bit Adam |
ICML 2021 |
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed |
official |
— |
| BVR-L-SGD |
ICML 2021 |
Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning |
— |
— |
| SQuARM-SGD |
JSAIT 2021 |
SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization |
— |
— |
| SketchedAMSGrad |
ICDM 2022 |
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining |
— |
— |
| 0/1 Adam |
ICLR 2023 |
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam |
official |
— |
| AdaCGD |
TMLR 2023 |
Adaptive Compression for Communication-Efficient Distributed Training |
— |
— |
| DiLoCo |
arXiv 2023 |
DiLoCo: Distributed Low-Communication Training of Language Models |
community |
— |
| Distributed Shampoo |
arXiv 2023 |
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale |
official |
— |
| SPARQ-SGD |
TAC 2023 |
SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization |
— |
— |
| AdaFedAdam |
TMLCN 2024 |
Accelerating Fair Federated Learning: Adaptive Federated Adam |
official |
— |
| DeMo |
arXiv 2024 |
DeMo: Decoupled Momentum Optimization |
official |
— |
| FADAS |
ICML 2024 |
FADAS: Towards Federated Adaptive Asynchronous Optimization |
official |
— |
| FAGH |
arXiv 2024 |
FAGH: Accelerating Federated Learning with Approximated Global Hessian |
— |
— |
| Fed-Sophia |
ICC 2024 |
Fed-Sophia: A Communication-Efficient Second-Order Federated Learning Algorithm |
— |
— |
| FedLion |
ICASSP 2024 |
FedLion: Faster Adaptive Federated Optimization with Fewer Communication |
official |
— |
| FedRepOpt |
ACCV 2024 |
FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning |
official |
— |
| FedSTaS |
arXiv 2024 |
FedSTaS: Client Stratification and Client Level Sampling for Efficient Federated Learning |
official |
— |
| FESS-GDA |
AISTATS 2024 |
Stochastic Smoothed Gradient Descent Ascent for Federated Minimax Optimization |
— |
— |
| FLeNS |
BigData 2024 |
FLeNS: Federated Learning with Enhanced Nesterov-Newton Sketch |
official |
— |
| MM-PSGD / MC-PSGD |
MMAsia-W 2024 |
Distributed Optimization over Block-Cyclic Data |
— |
— |
| OpenDiLoCo |
arXiv 2024 |
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training |
official |
— |
| ADEF |
arXiv 2025 |
Accelerated Distributed Optimization with Compression and Error Feedback |
— |
— |
| DAT-SGD |
ICML 2025 |
Enhancing Parallelism in Decentralized Stochastic Convex Optimization |
— |
— |
| DeCo-SGD |
arXiv 2025 |
Taming Latency and Bandwidth: A Theoretical Framework and Adaptive Algorithm for Communication-Constrained Training |
— |
— |
| DES-LOC |
arXiv 2025 |
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models |
— |
— |
| Dion |
arXiv 2025 |
Dion: Distributed Orthonormalized Updates |
official |
— |
| DLAS-R-FTC |
CDC 2025 |
Distributed Optimization and Learning for Automated Stepsize Selection with Finite Time Coordination |
— |
— |
| FAdamGC |
arXiv 2025 |
Gradient Correction in Federated Learning with Adaptive Optimization |
— |
— |
| FedCET |
arXiv 2025 |
Communication Efficient Federated Learning with Linear Convergence on Heterogeneous Data |
— |
— |
| FedIvon |
TMLR 2025 |
Federated Learning with Uncertainty and Personalization via Efficient Second-order Optimization |
— |
— |
| FedMuon |
arXiv 2025 |
FedMuon: Accelerating Federated Learning with Matrix Orthogonalization |
official |
— |
| FedOne |
ICML 2025 |
FedOne: Query-Efficient Federated Learning for Black-box Discrete Prompt Learning |
— |
— |
| HybridSGD |
arXiv 2025 |
Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization |
— |
— |
| Kuramoto-FedAvg |
arXiv 2025 |
Kuramoto-FedAvg: Using Synchronization Dynamics to Improve Federated Learning Optimization under Statistical Heterogeneity |
official |
— |
| LQ-SGD |
arXiv 2025 |
Trustworthy Efficient Communication for Distributed Learning using LQ-SGD Algorithm |
— |
— |
| Muon |
arXiv 2025 |
Muon is Scalable for LLM Training |
official |
Muon |
| pFedSOP |
arXiv 2025 |
pFedSOP: Accelerating Training Of Personalized Federated Learning Using Second-Order Optimization |
— |
— |
| LT-ADMM |
TAC 2026 |
Communication-Efficient Stochastic Distributed Learning |
— |
— |
| Ringleader ASGD |
ICLR 2026 |
Ringleader ASGD: The First Asynchronous SGD with Optimal Time Complexity under Data Heterogeneity |
— |
— |
| DECA |
arXiv 2026 |
DECA: Decentralizing Block-Wise Adam for Efficient LLM Full-Parameter Fine-Tuning on Non-IID Data |
— |
— |
| Ringmaster LMO |
arXiv 2026 |
Ringmaster LMO: Asynchronous Linear Minimization Oracle Momentum Method |
— |
— |
| SignMuon |
arXiv 2026 |
SignMuon: Communication-Efficient Distributed Muon Optimization |
— |
— |
| Orth-Dion |
arXiv 2026 |
Orth-Dion: Eliminating Geometric Mismatch in Distributed Low-Rank Spectral Optimization |
— |
— |
| EF21-Muon |
arXiv 2025 |
Error Feedback for Muon and Friends |
— |
— |
| MuonBP |
ICLR 2026 |
MuonBP: Faster Muon via Block-Periodic Orthogonalization |
— |
— |
| CurvaDion |
arXiv 2025 |
CurvaDion: Curvature-Adaptive Distributed Orthonormalization |
— |
— |
| Quasi-Newton FL with Error Feedback |
OPT 2025: Optimization for Machine Learning (NeurIPS 2025 Workshop) |
Quasi-Newton Methods for Federated Learning with Error Feedback |
— |
— |
| DeMuon |
arXiv 2025 |
DeMuon: A Decentralized Muon for Matrix Optimization over Graphs |
— |
— |
| HeLoCo |
arXiv 2026 |
HeLoCo: Efficient asynchronous low-communication training under data and device heterogeneity |
— |
— |
| Decoupled DiLoCo |
arXiv 2026 |
Decoupled DiLoCo for Resilient Distributed Pre-training |
— |
— |
| Partial Parameter Updates |
arXiv 2025 |
Partial Parameter Updates for Efficient Distributed Training |
— |
— |
| SparseLoCo |
arXiv 2025 |
Communication Efficient LLM Pre-training with SparseLoCo |
official |
— |
| GASLoC |
arXiv 2026 |
Unifying Local Communications and Local Updates for LLM Pretraining |
— |
— |
| MG-ADSGD |
arXiv 2026 |
Accelerated Decentralized Stochastic Gradient Descent for Strongly Convex Optimization |
— |
— |
| Local MixVR |
arXiv 2026 |
Local MixVR: Breaking the Communication-Sample Dependence in Distributed Learning |
— |
— |
| LOSCAR-SGD |
arXiv 2026 |
LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging |
— |
— |
| HEW-Local SGD |
arXiv (math.OC) 2026 |
Heterogeneous-Horizon Exact-Weight Local SGD |
— |
— |
| CAPTAIN (C-ALADIN) |
arXiv 2026 |
A Global Convergence Analysis of Consensus ALADIN for Convex Optimization |
— |
— |
| FedPAC |
arXiv 2026 |
Taming Preconditioner Drift: Unlocking the Potential of Second-Order Optimizers for Federated Learning on Non-IID Data |
official |
— |
| FedAdamW |
AAAI 2026 |
FedAdamW: A Communication-Efficient Optimizer with Convergence and Generalization Guarantees for Federated Large Models |
official |
— |
| LoRDO |
arXiv 2026 |
LoRDO: Distributed Low-Rank Optimization with Infrequent Communication |
— |
— |