FedRepOpt¶
Implements FedRepOpt, a gradient re-parameterized optimizer that lets federated clients train a plain single-branch model while matching a multi-branch reference.
The reference architecture is a Constant-Scale Linear Addition (CSLA) block whose output sums scaled parallel branches, e.g. \(\alpha_3 W_3 \ast x + \alpha_1 W_1 \ast x\). Rather than carry those extra branches, FedRepOpt trains one merged operator \(W'\) and reproduces the multi-branch training dynamics with two rules: initialize \(W' \leftarrow \alpha_3 W_3 + \alpha_1 W_1\), and during every update multiply the gradient by a constant "Grad Mult" derived from the branch scales. Because the multiplier is a fixed scalar per parameter group, each client runs an ordinary SGD step with no added communication or per-round overhead, which keeps it cheap in the federated setting.
where \(w_k^{(i)}\) is a parameter of client \(k\) at local iteration \(i\), \(\eta\) is the learning rate, \(\mathcal{L}_k\) is the client's local loss, and \((\alpha_3^2 + \alpha_1^2)\) is the Grad Mult, the squared sum of the CSLA branch scales applied uniformly across the merged operator's parameters.
Reference: Kin Wai Lau, Yasar Abbas Ur Rehman, Pedro Porto Buarque de Gusmão, Lai-Man Po, Lan Ma, Yuyang Xie, "FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning", arXiv 2024. https://arxiv.org/abs/2409.15898