Accelerated Distance-adaptive Method (DoG-lineage)¶
Implements the Accelerated Distance-adaptive Method (AGDA), a learning-rate-free accelerated dual-averaging scheme in the DoG lineage.
This method extends the distance-over-gradients idea to Nesterov-style acceleration without any learning rate. It maintains three coupled sequences: a dual-averaging iterate \(v_t\), an extrapolated query point \(\theta_t\), and an averaged output point \(y_t\). The effective step is set entirely from an adaptively estimated distance to the optimum, \(\bar r_t\), taken as the running maximum of \(\|\theta_0 - v_t\|\), so no learning rate is tuned.
Acceleration weights \(a_t\) grow with the accumulated square-root distance, giving the classical \(\mathcal{O}(1/t^2)\) schedule when the distance estimate stabilizes. The regularization coefficient \(\beta_t\) (the inverse of the dual-averaging step) is chosen by a line search / closed-form balance so the accelerated descent inequality holds; the gradients are then accumulated into a single dual-averaging update.
where \(\theta_0\) is the initial point, \(v_t\) is the dual-averaging iterate (with \(v_0 = y_0 = \theta_0\)), \(\theta_t\) is the query point at which the gradient \(\nabla f(\theta_t)\) is taken, \(y_t\) is the returned averaged iterate, \(\bar r_t\) is the adaptive distance estimate, \(A_t\) and \(a_t\) are the cumulative and incremental acceleration weights, \(\tau_t\) is the momentum interpolation factor, and \(\beta_{t+1}\) is the dual-averaging regularization coefficient set by line search to satisfy the accelerated descent condition. The displayed \(v\) update is the unconstrained (\(g \equiv 0\)) closed form of \(v_{t+1} = \arg\min_\theta \{ \tfrac{\beta_{t+1}}{2}\|\theta_0 - \theta\|^2 + \sum_{i=1}^{t+1} a_i(\langle \nabla f(\theta_i), \theta - \theta_i\rangle + g(\theta))\}\).
Reference: Yijin Ren, Haifeng Xu, Qi Deng, "Accelerated Distance-adaptive Methods for Hölder Smooth and Convex Optimization", arXiv 2025. https://arxiv.org/abs/2510.22135