OLion¶
Implements OLion, an optimizer that intersects the spectral bias of Muon with the \(\ell_\infty\) bias of Lion.
OLion forms a Nesterov-mixed momentum, orthogonalizes it through Newton-Schulz iteration to obtain a spectral (semi-orthogonal) direction, then applies an entrywise sign to push the update toward the \(\ell_\infty\) ball. The signed matrix is rescaled by an RMS-alignment factor so its magnitude matches a unit-RMS step, and the parameters are updated with decoupled weight decay.
where \(\theta\) are the parameters (a \(d_1 \times d_2\) matrix), \(\eta_t\) the learning rate, \(g_t\) the gradient, \(m_t\) the momentum, \(\beta_1\) the Nesterov-mixing weight, \(\beta_2\) the momentum decay, \(\lambda\) the weight decay, \(K\) the number of Newton-Schulz iterations, \(\mathrm{sign}\) the entrywise sign, and \(\gamma_t \approx 0.2\) the RMS-alignment scale (\(\lVert \cdot \rVert_F\) is the Frobenius norm).
Reference: Zixiao Wang, Yifei Shen, Huishuai Zhang, "OLion: Approaching the Hadamard Ideal by Intersecting Spectral and \(\ell_\infty\) Implicit Biases", arXiv 2025. https://arxiv.org/abs/2602.01105