Learning-Rate-Free Optimizers¶

Learning-rate-free (also called parameter-free or tuning-free) optimizers select their step size automatically during training instead of requiring a manually tuned learning rate. Most methods in this family estimate a quantity such as the distance from the initial point to the solution and set the effective step size from observed gradients, while others wrap an existing base optimizer and tune its global scale factor online. The goal is to match the performance of a well-tuned baseline without a learning-rate search.

Optimizer	Venue	Paper	Code	`zij`
AdGD	ICML 2020	Adaptive Gradient Descent without Descent	official	—
ALI-G	ICML 2020	Training Neural Networks for and by Interpolation	official	—
AdaBFE	arXiv 2022	BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization	—	—
D-Adaptation	ICML 2023	Learning-Rate-Free Learning by D-Adaptation	official	`DAdaptSGD`, `DAdaptAdam`
DoG	ICML 2023	DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule	official	`DoG`, `LDoG`
Mechanic	NeurIPS 2023	Mechanic: A Learning Rate Tuner	official	`mechanize`
Adam++	arXiv 2024	Towards Simple and Provable Parameter-Free Adaptive Gradient Methods	—	—
MoMo	ICML 2024	MoMo: Momentum Models for Adaptive Learning Rates	official	`Momo`, `MomoAdam`
Prodigy	ICML 2024	Prodigy: An Expeditiously Adaptive Parameter-Free Learner	official	`Prodigy`
AdamG	arXiv 2024	Towards Stability of Parameter-free Optimization	community	`AdamG`
TRAC	NeurIPS 2024	Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning	official	`TRAC`
Accelerated GRAAL	arXiv 2025	Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization	—	—
AutoSGD	arXiv 2025	AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent	—	—
EAGLE	arXiv 2025	eagle: early approximated gradient based learning rate estimator	—	—
ScheduleFree+	arXiv 2026	ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models	official	—
AMUSE	arXiv 2026	AMUSE: Anytime Muon with Stable Gradient Evaluation	—	—
Adaptive Polyak Steps (SF-SGD / SF-Adam)	arXiv 2025	Taking the Road Less Scheduled with Adaptive Polyak Steps	—	—
GGD (Geodesic Gradient Descent)	arXiv 2026	Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds	—	—
Accelerated Distance-adaptive Method (DoG-lineage)	NeurIPS 2025	Accelerated Distance-adaptive Methods for Hölder Smooth and Convex Optimization	—	—
GeN	ICLR 2025	Gradient descent with generalized Newton's method	official	—
DoWG	NeurIPS 2023	DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method	official	—
U-DoG	COLT 2024	Accelerated Parameter-Free Stochastic Optimization	—	—
Sign-SGD via Parameter-Free Optimization	ICLR 2026	Sign-SGD via Parameter-Free Optimization	—	—
OptEMA	arXiv 2026	OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality	—	—