MCSD / SPEL¶
Implements MCSD / SPEL, manifold-constrained steepest descent via a norm-induced linear minimization oracle and projection back to the manifold.
Norm-constrained LMO-based optimizers such as spectral gradient descent and Muon extend awkwardly to manifold-constrained problems, usually requiring nested loops that solve tangent-space subproblems iteratively. MCSD is a single-loop alternative: at each step it picks a steepest-descent direction induced by a chosen norm by applying the LMO to the Riemannian gradient, takes a step, and returns to the manifold by projection. The stochastic variant smooths the gradient with momentum before the LMO. SPEL is the spectral-norm specialization on the Stiefel manifold, where the LMO and the projection both reduce to the matrix sign (polar) factor.
where \(\theta_t\) are the parameters constrained to the manifold \(\mathcal{M}\), \(g_t\) is the stochastic gradient, \(m_t\) its momentum estimate with decay \(\beta \in [0,1)\), \(\eta_t\) is the step size, \(P_{T_{\theta_t}\mathcal{M}}\) projects onto the tangent space at \(\theta_t\), \(\nabla_{\mathcal{M}} f\) is the Riemannian gradient, \(P_{\mathcal{M}}\) projects back onto the manifold, \(\mathrm{LMO}_{\|\cdot\|}\) is the linear minimization oracle for the chosen norm, and \(\mathrm{msign}(Y) = Y(Y^\top Y)^{-1/2}\) is the matrix sign (polar factor) onto the Stiefel manifold.
Reference: Kaiwei Yang, Lexiao Lai, "Manifold constrained steepest descent", ICML 2026. https://arxiv.org/abs/2601.21487