Skip to content

Learning Rate Schedulers

zij.core.lr_scheduler vendors the PyTorch core learning rate schedulers under their original class names. The first table lists every vendored class, including the LRScheduler base class, with the published work it derives from where one exists. The second table covers notable schedules from the literature that zij does not yet implement.

In zij

Scheduler Origin
ChainedScheduler
ConstantLR
CosineAnnealingLR Loshchilov & Hutter ICLR 2017 (SGDR)
CosineAnnealingWarmRestarts Loshchilov & Hutter ICLR 2017 (SGDR)
CyclicLR Smith WACV 2017 (cyclical learning rates)
ExponentialLR
LambdaLR
LinearLR
LRScheduler
MultiplicativeLR
MultiStepLR
OneCycleLR Smith & Topin 2019 (super-convergence)
PolynomialLR
ReduceLROnPlateau
SequentialLR
StepLR

Notable schedules elsewhere

Scheduler Venue Paper Code zij
Inverse square root NeurIPS 2017 Attention Is All You Need official
AdaS arXiv 2020 AdaS: Adaptive Scheduling of Stochastic Gradients official
Untuned Warmup AAAI 2021 On the adequacy of untuned warmup for adaptive optimization
AutoDrop UAI 2024 AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop
Schedule-Free NeurIPS 2024 The Road Less Scheduled official SGDScheduleFree, AdamWScheduleFree, RAdamScheduleFree, ScheduleFreeWrapper
WSD (Warmup-Stable-Decay) COLM 2024 MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies official
GreedyLR arXiv 2025 Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Convergence
Refined SF-AdamW NeurIPS 2025 Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
SF-NorMuon arXiv 2026 Anytime Training with Schedule-Free Spectral Optimization
WSM ICLR 2026 WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Power Decay / Warmup-Stable-Decay (WSD) arXiv 2026 Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay
Anytime (Horizon-Free WA schedule) arXiv 2026 Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging

Schedule-Free is not a schedule on top of an optimizer but a replacement for scheduling, achieved through online iterate averaging inside the optimizer; see the learning-rate-free optimizers.

Weight averaging is available separately in zij.core.swa_utils, which provides stochastic weight averaging and exponential moving average utilities (AveragedModel, SWALR, update_bn, and the SWA/EMA averaging functions), following Averaging Weights Leads to Wider Optima and Better Generalization (Izmailov et al., UAI 2018).