AdaFedAdam¶
Implements AdaFedAdam, a federated optimizer that applies Adam to fairness-weighted client pseudo-gradients.
AdaFedAdam treats each round of federated learning as a single server-side Adam step. Each client returns its accumulated local update \(\Delta_k\), which is normalized into a pseudo-gradient \(U_k\) together with a certainty score \(C_k\) that measures how far the local trajectory deviated from a plain gradient step. The server aggregates these pseudo-gradients with weights that combine dataset size, an inverse training-rate term \(I_k\) raised to a fairness exponent \(\alpha\), and then runs Adam on the aggregate.
To stay robust to heterogeneous client objectives, the aggregated certainty \(C\) rescales Adam's hyperparameters per round: the learning rate is scaled by \(C\) and the decay rates are raised to the power \(C\), so less certain rounds take smaller, more heavily smoothed steps.
where \(\theta\) are the global parameters, \(\eta\) the base server learning rate, \(F_k\) client \(k\)'s local objective, \(\Delta_k\) its accumulated local update, \(\eta_k\) its local learning rate, \(S_k\) its dataset-size weight, \(\alpha \ge 0\) the fairness exponent, \(\beta_1,\beta_2\) the base decay rates, \(c_{t,m},c_{t,v}\) the accumulated decay products used for bias correction, \(\odot\) elementwise product, and \(\epsilon\) a stability constant.
Reference: Li Ju, Tianru Zhang, Salman Toor, Andreas Hellander, "Accelerating Fair Federated Learning: Adaptive Federated Adam", arXiv 2023. https://arxiv.org/abs/2301.09357