Learning Rate Schedulers¶

zij.core.lr_scheduler vendors the PyTorch core learning rate schedulers under their original class names. The first table lists every vendored class, including the LRScheduler base class, with the published work it derives from where one exists. The second table covers notable schedules from the literature that zij does not yet implement.

In zij¶

Scheduler	Origin
`ChainedScheduler`	—
`ConstantLR`	—
`CosineAnnealingLR`	Loshchilov & Hutter ICLR 2017 (SGDR)
`CosineAnnealingWarmRestarts`	Loshchilov & Hutter ICLR 2017 (SGDR)
`CyclicLR`	Smith WACV 2017 (cyclical learning rates)
`ExponentialLR`	—
`LambdaLR`	—
`LinearLR`	—
`LRScheduler`	—
`MultiplicativeLR`	—
`MultiStepLR`	—
`OneCycleLR`	Smith & Topin 2019 (super-convergence)
`PolynomialLR`	—
`ReduceLROnPlateau`	—
`SequentialLR`	—
`StepLR`	—

Notable schedules elsewhere¶

Scheduler	Venue	Paper	Code	`zij`
Inverse square root	NeurIPS 2017	Attention Is All You Need	official	—
AdaS	arXiv 2020	AdaS: Adaptive Scheduling of Stochastic Gradients	official	—
Untuned Warmup	AAAI 2021	On the adequacy of untuned warmup for adaptive optimization	—	—
AutoDrop	UAI 2024	AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop	—	—
Schedule-Free	NeurIPS 2024	The Road Less Scheduled	official	`SGDScheduleFree`, `AdamWScheduleFree`, `RAdamScheduleFree`, `ScheduleFreeWrapper`
WSD (Warmup-Stable-Decay)	COLM 2024	MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies	official	—
GreedyLR	arXiv 2025	Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Convergence	—	—
Refined SF-AdamW	NeurIPS 2025	Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training	—	—
SF-NorMuon	arXiv 2026	Anytime Training with Schedule-Free Spectral Optimization	—	—
WSM	ICLR 2026	WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training	—	—
Power Decay / Warmup-Stable-Decay (WSD)	arXiv 2026	Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay	—	—
Anytime (Horizon-Free WA schedule)	arXiv 2026	Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging	—	—

Schedule-Free is not a schedule on top of an optimizer but a replacement for scheduling, achieved through online iterate averaging inside the optimizer; see the learning-rate-free optimizers.

Weight averaging is available separately in zij.core.swa_utils, which provides stochastic weight averaging and exponential moving average utilities (AveragedModel, SWALR, update_bn, and the SWA/EMA averaging functions), following Averaging Weights Leads to Wider Optima and Better Generalization (Izmailov et al., UAI 2018).