LBFGS¶
Implements L-BFGS, a limited-memory quasi-Newton method that approximates the inverse Hessian from the most recent curvature pairs.
L-BFGS takes a Newton-like step \(\theta_t = \theta_{t-1} - \eta\, H_t g_t\), where \(H_t\) approximates the inverse Hessian. Instead of storing the full \(H_t\), it keeps only the last \(m\) pairs of parameter and gradient differences \((s_i, y_i)\) and reconstructs the action \(H_t g_t\) on the fly through the two-loop recursion below, giving linear memory and cost per step.
where \(\theta\) are the parameters, \(\eta\) is the step length (chosen by line search), \(g_t\) is the gradient, \(s_i\) and \(y_i\) are the parameter and gradient differences of the stored pairs, \(m\) is the history size, and \(r = H_t g_t\) is the resulting search direction.
Reference: D. C. Liu and J. Nocedal, "On the limited memory BFGS method for large scale optimization", Mathematical Programming 1989. https://doi.org/10.1007/BF01589116