Inverse Learning-Rate Scaling

Removing inverse learning-rate scaling lowered final accuracy and slightly increased time-to-target in ablation experiments. Scaling the learning rate inversely with the local step count supports the convergence proof by bounding local dis…

1 sources - 5 claims

Removing inverse learning-rate scaling lowered final accuracy and slightly increased time-to-target in ablation experiments. Scaling the learning rate inversely with the local step count supports the convergence proof by bounding local displacement. FedQueue scales each client's learning rate inversely with its local step budget to equalize effective local displacement. Inverse learning-rate scaling addresses instability caused by different local step counts across clients. Unscaled local SGD could let clients with larger local step counts move farther from the broadcast model and dominate aggregation.