Sharpness-Aware Optimizers¶

Sharpness-aware methods seek parameters that lie in neighborhoods with uniformly low loss rather than at isolated minima, which tends to improve generalization. Introduced by SAM (Foret et al., ICLR 2021), these methods wrap a base optimizer such as SGD or AdamW and add a gradient ascent perturbation step before the descent update. Later work makes the perturbation scale-invariant, closes the surrogate gap, reweights the sharpness term, amortizes the extra forward-backward cost, or extends the idea to second-order optimization.

Optimizer	Venue	Paper	Code	`zij`
SAM	ICLR 2021	Sharpness-Aware Minimization for Efficiently Improving Generalization	community	`SAM`
ASAM	ICML 2021	ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks	community	`ASAM`
ESAM	ICLR 2022	Efficient Sharpness-aware Minimization for Improved Training of Neural Networks	—	—
GSAM	ICLR 2022	Surrogate Gap Minimization Improves Sharpness-Aware Training	official	`GSAM`
LookSAM	CVPR 2022	Towards Efficient and Scalable Sharpness-Aware Minimization	community	`LookSAM`
AE-SAM	ICLR 2023	An Adaptive Policy to Employ Sharpness-Aware Minimization	—	—
bSAM	ICLR 2023	SAM as an Optimal Relaxation of Bayes	official	—
GAM	CVPR 2023	Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization	—	—
WSAM	KDD 2023	Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term	official	`WSAM`
AdaSAM	Neural Networks 2024	AdaSAM: Boosting Sharpness-Aware Minimization with Adaptive Learning Rate and Momentum for Training Deep Neural Networks	—	—
F-SAM	CVPR 2024	Friendly Sharpness-Aware Minimization	official	—
FGSAM	NeurIPS 2024	Fast Graph Sharpness-Aware Minimization for Enhancing and Accelerating Few-Shot Node Classification	—	—
Lookbehind-SAM	ICML 2024	Lookbehind-SAM: k steps back, 1 step forward	—	—
MSAM	arXiv 2024	Momentum-SAM: Sharpness Aware Minimization without Computational Overhead	official	—
SAMPa	NeurIPS 2024	SAMPa: Sharpness-aware Minimization Parallelized	—	—
AsyncSAM	arXiv 2025	Asynchronous Sharpness-Aware Minimization For Fast and Accurate Deep Learning	—	—
GCSAM	arXiv 2025	GCSAM: Gradient Centralized Sharpness Aware Minimization	official	—
LightSAM	arXiv 2025	LightSAM: Parameter-Agnostic Sharpness-Aware Minimization	—	—
SASSHA	ICML 2025	SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation	official	—
SSAM	JMLR 2025	Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy	—	—
SAM-Polyak (Adaptive SAM with Polyak step size)	ICML 2026	Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler	official	—
X-SAM	arXiv 2026	X-SAM: Boosting Sharpness-Aware Minimization with Dominant-Eigenvector Gradient Correction	—	—
M-SAM (Modality-Aware SAM)	NeurIPS 2025	Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning	—	—
ZSharp (SAM with Z-Score Gradient Filtering)	NeurIPS 2025 OPT Workshop (also accepted to ICASSP 2026)	Sharpness-Aware Minimization with Z-Score Gradient Filtering	official	—
Focal-SAM	ICML 2025	Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification	official	—
Functional SAM	ICML 2025	Avoiding spurious sharpness minimization broadens applicability of SAM	—	—
FedGMT	ICML 2025	One Arrow, Two Hawks: Sharpness-aware Minimization for Federated Learning via Global Model Trajectory	official	—
LE-SAM	ICML 2026	Fix the Loss, Not the Radius: Rethinking the Adversarial Perturbation of Sharpness-Aware Minimization	—	—