RLEKF¶
Implements RLEKF, a reorganized-layer extended Kalman filter optimizer for training deep potential models.
RLEKF treats network training as a nonlinear state estimation problem: the weights are the hidden state and each label is a noisy measurement. The extended Kalman filter (EKF) updates the weights with a Kalman gain derived from an error covariance matrix \(P_t\), and a memory (forgetting) factor \(\lambda_t\) progressively discounts older observations. To make the full EKF tractable for large networks, RLEKF reorganizes the parameters into \(L\) blocks and keeps a block-diagonal covariance, so the gain is computed per block from the local gradient.
The per-step EKF update is
where \(\theta\) are the network weights, \(g_t = H_t = \partial h(\theta, x_t)/\partial\theta\) is the Jacobian of the model output \(h\) evaluated at \(\theta_{t-1}\), \(\varepsilon_t\) is the prediction error against label \(y_t\), \(K_t\) is the Kalman gain, \(P_t\) is the weight error covariance (\(P_0 = I\)), \(\alpha_t^2 R_t\) is the measurement noise covariance (set to \(L I\)), \(\lambda_t \in (0,1]\) is the memory factor with initial value \(\lambda_1\), and \(\nu\) is the forgetting rate.
Reference: Siyu Hu, Wentao Zhang, Qiuchen Sha, Feng Pan, Lin-Wang Wang, Weile Jia, Guangming Tan, Tong Zhao, "RLEKF: An Optimizer for Deep Potential with Ab Initio Accuracy", AAAI 2023. https://arxiv.org/abs/2212.06989