Skip to content

Learning-Rate-Free Optimizers

Learning-rate-free (also called parameter-free or tuning-free) optimizers select their step size automatically during training instead of requiring a manually tuned learning rate. Most methods in this family estimate a quantity such as the distance from the initial point to the solution and set the effective step size from observed gradients, while others wrap an existing base optimizer and tune its global scale factor online. The goal is to match the performance of a well-tuned baseline without a learning-rate search.

Optimizer Venue Paper Code zij
AdGD ICML 2020 Adaptive Gradient Descent without Descent official
ALI-G ICML 2020 Training Neural Networks for and by Interpolation official
AdaBFE arXiv 2022 BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization
D-Adaptation ICML 2023 Learning-Rate-Free Learning by D-Adaptation official DAdaptSGD, DAdaptAdam
DoG ICML 2023 DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule official DoG, LDoG
Mechanic NeurIPS 2023 Mechanic: A Learning Rate Tuner official mechanize
Adam++ arXiv 2024 Towards Simple and Provable Parameter-Free Adaptive Gradient Methods
MoMo ICML 2024 MoMo: Momentum Models for Adaptive Learning Rates official Momo, MomoAdam
Prodigy ICML 2024 Prodigy: An Expeditiously Adaptive Parameter-Free Learner official Prodigy
AdamG arXiv 2024 Towards Stability of Parameter-free Optimization community AdamG
TRAC NeurIPS 2024 Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement Learning official TRAC
Accelerated GRAAL arXiv 2025 Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization
AutoSGD arXiv 2025 AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent
EAGLE arXiv 2025 eagle: early approximated gradient based learning rate estimator
ScheduleFree+ arXiv 2026 ScheduleFree+: Scaling Learning-Rate-Free & Schedule-Free Learning to Large Language Models official
AMUSE arXiv 2026 AMUSE: Anytime Muon with Stable Gradient Evaluation
Adaptive Polyak Steps (SF-SGD / SF-Adam) arXiv 2025 Taking the Road Less Scheduled with Adaptive Polyak Steps
GGD (Geodesic Gradient Descent) arXiv 2026 Geodesic Gradient Descent: A Generic and Learning-rate-free Optimizer on Objective Function-induced Manifolds
Accelerated Distance-adaptive Method (DoG-lineage) NeurIPS 2025 Accelerated Distance-adaptive Methods for Hölder Smooth and Convex Optimization
GeN ICLR 2025 Gradient descent with generalized Newton's method official
DoWG NeurIPS 2023 DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method official
U-DoG COLT 2024 Accelerated Parameter-Free Stochastic Optimization
Sign-SGD via Parameter-Free Optimization ICLR 2026 Sign-SGD via Parameter-Free Optimization
OptEMA arXiv 2026 OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality