RGrad-Avg¶

Implements RGrad-Avg, a gradient-averaging variant of Riemannian Gradient Descent.

RGrad-Avg lifts the Euclidean Grad-Avg scheme (a Heun-style predictor-corrector) to Riemannian submanifolds. Plain RGD takes one retracted step along the negative Riemannian gradient; RGrad-Avg instead computes a predicted point, evaluates the gradient there, and retracts along the average of the two gradients. Because the gradient at the predicted point lives in a different tangent space, it is parallel-transported back to the current point before averaging.

Let \(\mathcal{M}\) be a Riemannian submanifold, \(\mathrm{grad}\,f\) the Riemannian gradient (the Euclidean gradient orthogonally projected onto the tangent space), \(\mathrm{Re}_x\) the (exponential) retraction at \(x\), and \(P_v\) the parallel transport along the retraction curve generated by \(v\). One iteration is:

\[ \begin{aligned} v &= -\gamma\,\mathrm{grad}\,f(x_k), \\ \bar{x}_{k+1} &= \mathrm{Re}_{x_k}\!\left(-\gamma\,\mathrm{grad}\,f(x_k)\right), \\ x_{k+1} &= \mathrm{Re}_{x_k}\!\left(-\tfrac{\gamma}{2}\left(\mathrm{grad}\,f(x_k) + P_v^{-1}\big(\mathrm{grad}\,f(\bar{x}_{k+1})\big)\right)\right). \end{aligned} \]

where \(x_k \in \mathcal{M}\) is the current iterate, \(\gamma > 0\) is the step size, \(\bar{x}_{k+1}\) is the predicted point, \(\mathrm{grad}\,f(\bar{x}_{k+1})\) is the gradient at that prediction, and \(P_v^{-1}\) transports it from \(\bar{x}_{k+1}\) back to the tangent space at \(x_k\) for averaging. On \(\mathcal{M} = \mathbb{R}^n\) with \(\mathrm{Re}_x(s) = x + s\) and identity transport, this reduces to Euclidean Grad-Avg.

Reference: Saugata Purkayastha, Sukannya Purkayastha, "On Riemannian Gradient Descent Algorithm using gradient averaging", OPT2025: 17th Annual Workshop on Optimization for Machine Learning, 2025. https://opt-ml.org/papers/2025/paper7.pdf

Back to the Canon