Zeroth-Order Optimizers¶

Zeroth-order (gradient-free) methods train models using only function evaluations, estimating gradients from randomized perturbations of the parameters instead of backpropagation. Because they need no backward pass or activation storage, they run at roughly inference-level memory, which has made them a practical option for fine-tuning large language models on constrained hardware. The lineage runs from SPSA in classical stochastic approximation to recent variance-reduced and low-rank variants built on MeZO.

Optimizer	Venue	Paper	Code	`zij`
SPSA	IEEE Transactions on Automatic Control 1992	Multivariate stochastic approximation using a simultaneous perturbation gradient approximation	official	—
Evolution Strategies	arXiv 2017	Evolution Strategies as a Scalable Alternative to Reinforcement Learning	official	—
ZO-AdaMM	NeurIPS 2019	ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization	official	—
MeZO	NeurIPS 2023	Fine-Tuning Language Models with Just Forward Passes	official	—
DeepZero	ICLR 2024	DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training	official	—
LeZO	arXiv 2024	Simultaneous Computation and Memory Efficient Zeroth-Order Optimizer for Fine-Tuning Large Language Models	official	—
MeZO-SVRG	ICML 2024	Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models	official	—
ZO-AdaMU	AAAI 2024	ZO-AdaMU Optimizer: Adapting Perturbation by the Momentum and Uncertainty in Zeroth-order Optimization	official	—
ZoPro	CDC 2024	A Zeroth-Order Proximal Algorithm for Consensus Optimization	—	—
Addax	ICLR 2025	Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models	official	—
DiZO	NeurIPS 2025	Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning	official	—
ElasticZO	arXiv 2025	ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization	—	—
HELENE	EMNLP 2025	HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization	—	—
KerZOO	arXiv 2025	KerZOO: Kernel Function Informed Zeroth-Order Optimization for Accurate and Accelerated LLM Fine-Tuning	—	—
LORENZA	arXiv 2025	LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM	—	—
LOZO	ICLR 2025	Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures	official	—
MaZO	arXiv 2025	MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models	—	—
QuZO	EMNLP 2025	QuZO: Quantized Zeroth-Order Fine-Tuning for Large Language Models	official	—
R-AdaZO	ICML 2025	Refining Adaptive Zeroth-Order Optimization at Ease	official	—
Sparse MeZO	NeurIPS 2025	Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning	official	—
SubZero	ICCV 2025	Zeroth-Order Fine-Tuning of LLMs in Random Subspaces	official	—
TeZO	arXiv 2025	TeZO: Empowering the Low-Rankness on the Temporal Dimension in the Zeroth-Order Optimization for Fine-tuning LLMs	—	—
VAMO	arXiv 2025	VAMO: Efficient Zeroth-Order Variance Reduction for SGD with Faster Convergence	—	—
VR-SZD	arXiv 2025	A Structured Proximal Stochastic Variance Reduced Zeroth-order Algorithm	official	—
ZO-SAH	arXiv 2025	Subspace-based Approximate Hessian Method for Zeroth-Order Optimization	—	—
ZO2	COLM 2025	ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory	official	—
ZOQO	ICASSP 2025	ZOQO: Zero-Order Quantized Optimization	—	—
AdaMeZO	arXiv 2026	AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments	official	—
FZOO	ICLR 2026	FZOO: Fast Zeroth-Order Optimizer for Fine-Tuning Large Language Models towards Adam-Scale Speed	official	—
MEAZO	arXiv 2026	On Adaptivity in Zeroth-Order Optimization	—	—
QZO	ICLR 2026	Fine-tuning Quantized Neural Networks with Zeroth-order Optimization	official	—
GRZO	arXiv 2026	GRZO: Group-Relative Zeroth-Order Optimization for Large Language Model Fine-Tuning	—	—
AGZO	ICML 2026	AGZO: Activation-Guided Zeroth-Order Optimization for LLM Fine-Tuning	—	—
ZO-MOPI	arXiv 2026	Accelerating Zeroth-Order Spectral Optimization with Partial Orthogonalization from Power Iteration	official	—
ZO-Muon	arXiv 2026	Powering Up Zeroth-Order Training via Subspace Gradient Orthogonalization	official	—
RLR (Recursive Likelihood Ratio)	ICLR 2026	Half-order Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer	official	—
ZO Fine-tuner	arXiv (accepted to ICML 2026) 2025	Learning a Zeroth-Order Optimizer for Fine-Tuning LLMs	official	—