FedAvg: https://arxiv.org/abs/1602.05629






__________________________________________________________________________________________________________
FedAvgM: https://arxiv.org/pdf/1909.06335 stands for Federated Averaging with Server Momentum.
It is an upgrade to the original FedAvg algorithm designed specifically to solve the “Client Drift” problemwhere the model gets confused because different users have very different data (Non-IID).
- The Problem: If one round of clients has very unusual data, the model will “jerk” in that direction. This causes the model to oscillate (zig-zag) and slows down convergence.
- The Solution (FedAvgM): The server keeps a Velocity Buffer (Memory). Instead of just following the current round’s average, it combines the current average with a “running history” of which way the model has been moving. This gives the model Inertia, making it much more stable.


The Formula: How it works mathematically
The FedAvgM update happens in three steps on the Server:
- Step 1: Calculate the “Pseudo-Gradient” (Δwt)
- The server looks at the difference between the global model at the start of the round (wt) and the average of the models returned by the clients (wavg):
- The server looks at the difference between the global model at the start of the round (wt) and the average of the models returned by the clients (wavg):
- Step 2: Update the Velocity (vt+1)
- The server updates its “memory” (the velocity). It takes a fraction of the old velocity and adds the new “diff”:
[β (Beta): The momentum factor (usually 0.9). It controls how much “memory” the server has.]
- The server updates its “memory” (the velocity). It takes a fraction of the old velocity and adds the new “diff”:
- Step 3: Update the Global Model (wt+1)
- The actual global model is updated using the velocity:

__________________________________________________________________________________________________________
FedProx: https://arxiv.org/abs/1812.06127

- The Global Objective: Federated learning goal as minimizing the aggregate loss across N devices:

2. The FedProx Local Subproblem (The “Prox” Term): This is the core mathematical contribution of the paper. Instead of just minimizing Fk(w) like in standard FedAvg, FedProx defines a new local objective hk that the client must solve in each round t:
