SNOO¶
Implements SNOO, an outer optimizer that applies Nesterov momentum to the pseudo-gradients of an inner optimizer.
SNOO (Step-K Nesterov Outer Optimizer) is a two-loop scheme. An inner optimizer (AdamW, Muon, etc.) runs for \(K\) steps on fast weights \(\tilde w\), and the displacement of the fast weights over those \(K\) steps defines a pseudo-gradient \(s_t\). The slow weights are then advanced by applying Nesterov momentum to this pseudo-gradient. With outer momentum \(\mu = 0\) the method reduces exactly to Lookahead.
where \(w_t\) are the slow (outer) weights, \(\tilde w_{t,k}\) the fast (inner) weights at inner step \(k\), \(\mathcal{T}_{t,k}\) the inner optimizer's update map applied to minibatch \(\xi_{t,k}\) with inner learning rate \(\tilde\eta_{t,k}\), \(s_t\) the pseudo-gradient (trajectory displacement over \(K\) inner steps), \(b_t\) the outer Nesterov momentum buffer, \(\eta\) the outer learning rate, \(\mu\) the outer momentum coefficient, and \(K\) the outer step frequency.
Reference: Dominik Kallusky, Vinay Rao, Vishal Nandavanam, Hao-Jun Michael Shi, "SNOO: Step-K Nesterov Outer Optimizer - The Surprising Effectiveness of Nesterov Momentum Applied to Pseudo-Gradients", arXiv 2025. https://arxiv.org/abs/2510.15830