Pion¶
Implements Pion, a drop-in replacement for Muon that swaps uniform spectral whitening for a two-stage high-pass Promotion+Suppression spectral filter.
Pion targets settings where Muon's uniform whitening of the momentum buffer hurts, such as vision-language-action training and reinforcement learning. It keeps Muon's structure of momentum followed by a matrix-sign step \(\mathrm{msign}(\cdot)\) computed by Newton-Schulz iterations, but redefines \(\mathrm{msign}\) as a composed high-pass filter. After Frobenius normalization, a promotion polynomial is iterated \(k_p\) times to lift the dominant singular directions, then a suppression polynomial is iterated \(k_s = k - k_p\) times to damp small (noise) singular values, reshaping each singular value \(\sigma \in [0,1]\) rather than flattening the whole spectrum to one.
where \(\mathrm{msign}(m_t)\) is the result \(X\) of the iterations started from \(X_0\), the promotion coefficients are \((a_p, b_p, c_p) = (1.875, -1.25, 0.375)\) giving \(f_p(\sigma) = 1.875\,\sigma - 1.25\,\sigma^3 + 0.375\,\sigma^5\), the suppression coefficients are \((a_s, b_s, c_s) = (0, 2.5, -1.5)\) giving \(f_s(\sigma) = 2.5\,\sigma^3 - 1.5\,\sigma^5\), \(g_t\) is the gradient, \(m_t\) the momentum buffer, \(\mu\) the momentum coefficient, \(\eta\) the learning rate, \(\epsilon\) a small constant, and \(\theta\) the matrix-shaped parameters.
Reference: Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu, "Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR", arXiv 2026. https://arxiv.org/abs/2605.19282