SOAA¶
Implements SOAA (Second-Order Adaptive Adam), an Adam variant that scales the step by a diagonal Fisher approximation inside an adaptive trust region.
SOAA keeps Adam's bias-corrected first and second moments but augments the denominator with a diagonal Fisher information estimate \(F_t\) built from the moments. The effective step size is clamped by a trust-region scale that takes the elementwise maximum of \(d_t F_t\) and \(\sqrt{\hat{v}_t}\), and the trust-region radius \(d_t\) is rescaled each step by the ratio of actual to predicted loss reduction, so the optimizer expands the step when predictions are accurate and contracts it otherwise.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t\) the gradient, \(m_t,v_t\) the first and second moment estimates with bias-corrected forms \(\hat{m}_t,\hat{v}_t\), \(\beta_1,\beta_2\) their decay rates, \(\lambda\) the weight decay, \(\epsilon\) a stability constant, \(F_t\) the diagonal Fisher approximation, \(r_t\) the trust-region scale, \(d_t\) the trust-region radius, \(\hat{\ell}-\ell_t\) the actual loss reduction, \(p_t\) the predicted reduction, \(\gamma\) the radius bound factor, and \(T\) the total number of steps.
Reference: James Vo and Anh-Dung Vo, "Efficient Second-Order Neural Network Optimization via Adaptive Trust Region Methods", arXiv preprint 2024. https://arxiv.org/abs/2410.02293