AdaBFE¶
Implements AdaBFE, a learning-rate-free optimizer that adapts a per-dimension step size by measuring the angle between successive gradients.
AdaBFE (Adaptive Binary Forward Exploration) probes each dimension independently. It takes a tentative step, recomputes the gradient there, and forms an angular change metric between the old and new gradient. If the gradients have turned too sharply (metric at or above a threshold), the step overshot and the learning rate is halved; otherwise the step was conservative and the rate is doubled. A binary search on \(\eta_i\) thus drives the per-coordinate step toward the largest move that keeps the gradient direction stable, removing the need to tune a fixed learning rate.
where \(\theta_{t,i}\) is the \(i\)-th coordinate of the parameters at step \(t\), \(\eta_i\) its per-dimension learning rate, \(g_{t,i}\) and \(g^{*}_{t,i}\) the gradients before and after the tentative step, \(\epsilon_{c,i}\) the measured angle of gradient change, and \(\epsilon_{v,i}\) a target threshold angle. The arctan expression is the angle between the two consecutive gradient values.
Reference: Xin Cao, "BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization", arXiv 2022. https://arxiv.org/abs/2207.02763