An improved analysis of per-sample and per-update clipping in federated learning

Bo Li, Xiaowen Jiang, Mikkel N. Schmidt, Tommy Sonne Alstrøm, Sebastian U. Stich

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

94 Downloads (Orbit)

Abstract

Gradient clipping is key mechanism that is essential to differentially private training techniques in Federated learning. Two popular strategies are per-sample clipping, which clips the mini-batch gradient, and per-update clipping, which clips each user’s model update. However, there has not been a thorough theoretical analysis of these two clipping methods. In this work, we rigorously analyze the impact of these two clipping techniques on the convergence of a popular federated learning algorithm FedAvg under standard stochastic noise and gradient dissimilarity assumptions. We provide a convergence guarantee given any arbitrary clipping threshold. Specifically, we show that per-sample clipping is guaranteed to converge to the neighborhood of the stationary point, with the size dependent on the stochastic noise, gradient dissimilarity, and clipping threshold. In contrast, the convergence to the stationary point can be guaranteed with a sufficiently small stepsize in per-update clipping at the cost of more communication rounds. We further provide insights into understanding the impact of the improved convergence analysis in the differentially private setting.
Original languageEnglish
Title of host publicationProceedings of the The Twelfth International Conference on Learning Representations, ICLR 2024
Number of pages51
Publication date2024
Publication statusPublished - 2024
EventThe Twelfth International Conference on Learning Representations - Vienna, Austria
Duration: 7 May 202411 May 2024
Conference number: 12

Conference

ConferenceThe Twelfth International Conference on Learning Representations
Number12
Country/TerritoryAustria
CityVienna
Period07/05/202411/05/2024

Fingerprint

Dive into the research topics of 'An improved analysis of per-sample and per-update clipping in federated learning'. Together they form a unique fingerprint.

Cite this