ADAGB2¶
Implements ADAGB2, a stochastic second-order Adagrad for nonconvex bound-constrained optimization.
ADAGB2 generalizes Adagrad to second-order steps over a feasible set. At each iteration it forms the projected first-order direction \(d_t\), updates a per-coordinate accumulator \(w_t\) from the running sum of squared projected steps, and uses it to set a per-coordinate trust radius \(\Delta_t\) that bounds the step. A diagonal (or symmetric) Hessian approximation \(B_t\) supplies curvature through a scalar Cauchy-style scaling \(\gamma_t\), so the method interpolates between adaptive first-order and second-order behavior. In the unconstrained case with \(B_t = 0\) it collapses to the familiar Adagrad rule.
where \(\theta\) are the parameters, \(g_t\) the stochastic gradient, \(P_{\mathcal{F}}\) the projection onto the feasible set \(\mathcal{F}\), \(w_t\) the per-coordinate Adagrad accumulator (initialized \(w_{-1,i} = \varsigma \in (0,1]\)), \(\Delta_t\) the per-coordinate trust radius, \(B_t\) a symmetric Hessian approximation, \(\gamma_t\) the Cauchy scaling (set to \(1\) when \(s_t^\top B_t s_t \le 0\)), and \(\kappa_s \ge 1\) the step-bound constant; with \(B_t = 0\) and \(\mathcal{F} = \mathbb{R}^n\) this reduces to \(\theta_{t+1,i} = \theta_{t,i} - g_{t,i}/w_{t,i}\).
Reference: Stefania Bellavia, Serge Gratton, Benedetta Morini, Philippe L. Toint, "Fast Stochastic Second-Order Adagrad for Nonconvex Bound-Constrained Optimization", arXiv 2025. https://arxiv.org/abs/2505.06374