Cayley SGD¶
Implements Cayley SGD, momentum SGD constrained to the Stiefel manifold via the Cayley transform.
Cayley SGD optimizes parameters that must stay orthonormal (columns of \(X\) with \(X^\top X = I\)). It accumulates momentum in Euclidean space, projects it onto the tangent space of the manifold as a skew-symmetric matrix \(W\), and moves along the resulting curve using the Cayley transform \(Y(\alpha) = (I - \tfrac{\alpha}{2}W)^{-1}(I + \tfrac{\alpha}{2}W)X\), which preserves orthonormality exactly. To avoid the matrix inverse, the transform is evaluated by a fixed-point iteration, and an adaptive step size keeps the curve approximation accurate.
where \(X_t\) is the orthonormal parameter matrix, \(g_t = G(X_t)\) is the Euclidean gradient, \(m_t\) is the momentum, \(\beta\) the momentum coefficient, \(W_t\) the skew-symmetric tangent direction, \(\eta\) the base learning rate, \(q\) a step-size constant (default \(0.5\)), \(s\) the number of fixed-point iterations (default \(2\)), and \(\epsilon\) a small constant for stability.
Reference: Jun Li, Li Fuxin, Sinisa Todorovic, "Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform", ICLR 2020. https://arxiv.org/abs/2002.01113