FESS-GDA¶
Implements FESS-GDA, federated stochastic smoothed gradient descent ascent for distributed minimax optimization.
FESS-GDA solves federated minimax problems \(\min_x \max_y \frac{1}{M}\sum_i f_i(x,y)\) by carrying the centralized Smoothed-AGDA technique into the federated setting. Each communication round the server samples \(m\) clients and broadcasts the global model \((x_t, y_t)\); every participating client synchronizes and runs \(K\) local stochastic descent-on-\(x\), ascent-on-\(y\) steps with local learning rates \(\eta_{x,l}, \eta_{y,l}\), then returns its local model. The key departure from plain federated GDA is an auxiliary variable \(z_t\) that smooths the \(x\) update: the server effectively performs gradient descent ascent on the regularized surrogate \(\hat f(x,y,z) = f(x,y) + \frac{p}{2}\lVert x - z\rVert^2\), with \(z_t\) tracking \(x_t\) via exponential averaging. The penalty \(p\) is set to \(2l\) for the nonconvex-concave / PL / 1-point-concave settings (making \(\hat f\) strongly convex in \(x\)) and \(p = 0\) for the PL-PL setting.
where \(x\) is the minimization variable and \(y\) the maximization variable, \(z_t\) is the smoothing auxiliary variable, \(f_i\) is client \(i\)'s local function with stochastic sample \(\xi_{t,i}^{k}\), \(k = 1,\dots,K\) indexes the local steps initialized at \(x_{t,i}^{1}=x_t,\ y_{t,i}^{1}=y_t\), \(S_t\) is the sampled client subset with \(|S_t| = m\), \(\eta_{x,l}, \eta_{y,l}\) are local and \(\eta_{x,g}, \eta_{y,g}\) global learning rates, \(p\) is the smoothing penalty, \(\beta \in (0,1)\) is the smoothing rate, and \(P_Y\) is projection onto the feasible set \(Y\).
Reference: Wei Shen, Minhui Huang, Jiawei Zhang, Cong Shen, "Stochastic Smoothed Gradient Descent Ascent for Federated Minimax Optimization", AISTATS 2024. https://arxiv.org/abs/2311.00944