AuON¶
Implements AuON, a linear-time alternative to semi-orthogonal momentum updates.
AuON replaces the Newton-Schulz orthogonalization of Muon with an \(O(n)\) elementwise transform: the momentum matrix is first normalized by its Frobenius norm, then rescaled by the root-mean-square of its hyperbolic cosine. The \(\cosh\) map amplifies large entries, so heavy-tailed updates produce a larger RMS and thus stronger global shrinkage, yielding a spectrally contractive update (\(\lVert U \rVert_2 < 1\)) without any iterative matrix factorization. A per-matrix factor \(\sqrt{\max(1, m/n)}\) decouples the step scale from the aspect ratio, as in Muon.
where \(\theta\) are the (matrix) parameters, \(g_t\) the gradient, \(m_t\) the momentum buffer with decay \(\beta\), \(\lVert \cdot \rVert_F\) the Frobenius norm, \(\cosh(z) = (e^z + e^{-z})/2\) applied elementwise, \(N = m \cdot n\) the number of entries with \(m, n\) the matrix dimensions, \(\eta\) the learning rate, and \(\epsilon = 10^{-7}, 10^{-8}\) stability constants. A hybrid variant prepends a single Newton-Schulz iteration before the \(\cosh\)-RMS scaling.
Reference: Dipan Maity, "AuON: A Linear-time Alternative to Semi-Orthogonal Momentum Updates", arXiv 2025. https://arxiv.org/abs/2509.24320