SpecMuon¶
Implements SpecMuon, a spectrally guided Muon variant that regulates per-mode step sizes through a relaxed scalar auxiliary variable (RSAV).
Muon orthogonalizes the gradient in its singular-vector basis and takes unit-singular-value steps, which can be too aggressive for the ill-conditioned, multi-scale gradients of physics-informed learning. SpecMuon decomposes the matrix gradient \(G_t\) by SVD and, along the top-\(k\) dominant singular directions, replaces the unit weight with a per-mode auxiliary scalar \(r_{t,j}\) that is shrunk and relaxed according to the global loss energy \(\sqrt{\mathcal{L}_t}\). The remaining modes keep the standard Muon orthogonalized contribution. The resulting search direction \(O_t\) is then fed through Nesterov-style momentum.
where \(G_t\) is the matrix-valued gradient, \(\hat{G}_t\) its Frobenius-normalized form, \(u_j, s_j, v_j\) the \(j\)-th singular triple, \(r_{t,j}\) the per-mode auxiliary variable initialized at \(\sqrt{\mathcal{L}_0}\), \(\mathcal{L}_t\) the loss, \(\gamma\) the learning rate, \(\mu\) the momentum, \(\xi \in [0,1]\) the SAV smoothing factor, \(k\) the number of guided modes, and \(\epsilon\) a stability constant; \(U_{>k}, \Sigma_{>k}, V_{>k}\) are the remaining (non-guided) singular components.
Reference: Binghang Lu, Jiahao Zhang, Guang Lin, "Muon with Spectral Guidance: Efficient Optimization for Scientific Machine Learning", arXiv 2025. https://arxiv.org/abs/2602.16167