Skip to content

Zeroth-Order Optimizers

Zeroth-order (gradient-free) methods train models using only function evaluations, estimating gradients from randomized perturbations of the parameters instead of backpropagation. Because they need no backward pass or activation storage, they run at roughly inference-level memory, which has made them a practical option for fine-tuning large language models on constrained hardware. The lineage runs from SPSA in classical stochastic approximation to recent variance-reduced and low-rank variants built on MeZO.

Optimizer Venue Paper Code zij
SPSA IEEE Transactions on Automatic Control 1992 Multivariate stochastic approximation using a simultaneous perturbation gradient approximation official
Evolution Strategies arXiv 2017 Evolution Strategies as a Scalable Alternative to Reinforcement Learning official
ZO-AdaMM NeurIPS 2019 ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization official
MeZO NeurIPS 2023 Fine-Tuning Language Models with Just Forward Passes official
DeepZero ICLR 2024 DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training official
LeZO arXiv 2024 Simultaneous Computation and Memory Efficient Zeroth-Order Optimizer for Fine-Tuning Large Language Models official
MeZO-SVRG ICML 2024 Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models official
ZO-AdaMU AAAI 2024 ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization official
ZoPro CDC 2024 A Zeroth-Order Proximal Algorithm for Consensus Optimization
Addax ICLR 2025 Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models official
DiZO NeurIPS 2025 Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning official
ElasticZO arXiv 2025 ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization
HELENE EMNLP 2025 HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization
KerZOO arXiv 2025 KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning
LORENZA arXiv 2025 LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
LOZO ICLR 2025 Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures official
MaZO arXiv 2025 MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
QuZO EMNLP 2025 QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models official
R-AdaZO ICML 2025 Refining Adaptive Zeroth-Order Optimization at Ease official
Sparse MeZO NeurIPS 2025 Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning official
SubZero ICCV 2025 Zeroth-Order Fine-Tuning of LLMs in Random Subspaces official
TeZO arXiv 2025 TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs
VAMO arXiv 2025 VAMO: Efficient Zeroth-Order Variance Reduction for SGD with Faster Convergence
VR-SZD arXiv 2025 A Structured Proximal Stochastic Variance Reduced Zeroth-order Algorithm official
ZO-SAH arXiv 2025 Subspace-based Approximate Hessian Method for Zeroth-Order Optimization
ZO2 COLM 2025 ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory official
ZOQO ICASSP 2025 ZOQO: Zero-Order Quantized Optimization
AdaMeZO arXiv 2026 AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments official
FZOO ICLR 2026 FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed official
MEAZO arXiv 2026 On Adaptivity in Zeroth-Order Optimization
QZO ICLR 2026 Fine-tuning Quantized Neural Networks with Zeroth-order Optimization official
GRZO arXiv 2026 GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning
AGZO ICML 2026 AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning
ZO-MOPI arXiv 2026 Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration official
ZO-Muon arXiv 2026 Powering Up Zeroth-Order Training via Subspace Gradient Orthogonalization official
RLR (Recursive Likelihood Ratio) ICLR 2026 Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer official
ZO Fine-tuner arXiv (accepted to ICML 2026) 2025 Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs official