Event date:
May 20 2021 12:00 pm

On Addressing System Heterogeneity Challenges in Federated Learning

Supervisor
Dr. Zafar Ayyub Qazi
Student
Muhammad Mustansar Saeed
Venue
Zoom Meetings (Online)
Event
MS Thesis defense
Abstract
Deep learning models are increasingly being used in many domains and big data plays a central role in their effectiveness. Applications such as speech recognition, self-driving cars, mobile keyboard prediction (e.g., next word predictions in Gboard), recommender systems (e.g., showing products to the users based on their interests) require large corpus of data. Oftentimes, useful data is private, and it is not practical to transmit and store all data in a centralized location due to data privacy concerns. An alternative environment for distributed machine learning is Federated learning (FL). FL is a form of privacy-preserving machine learning that involves training over distributed data spread across multiple edge devices. FL differs from typical distributed machine learning environments along multiple dimensions; clients in FL exhibit greater heterogeneity in terms of computational power, memory, and network bandwidth. Training throughput is bounded by slow clients with low computational power and/or slow network communication, which is known as the straggler effect. In traditional FL algorithms, stragglers are often dropped from the training process for efficiency purposes; this approach has a negative effect on accuracy and becomes biased towards fast clients.

In this thesis, we empirically quantify the impact of device resources, such as number of CPU cores and RAM size, and device states (e.g., memory pressure regimes) on model training times over a real testbed. We also experimentally evaluate the effectiveness of a recently proposed adaptive model serving technique in which slow clients are trained over smaller models. Due to lack of fidelity of device emulation in simulated environments, we conduct our evaluation over a real distributed testbed. We evaluate the adaptive model serving technique over a synthetic dataset and a linear model. Our experiments show that the scheme improves training times by 52% on a Quadcore 1GB RAM device with 50% reduced model due to reduced memory pressure on the device. This scheme improves training times by 1.8x- 3.1x across different number of CPU cores and RAM sizes due to FLOPs parallelism and lower system utilization. The results show that FL training on resource constrained devices take prohibitively long time and adaptive model serving can be an effective technique to make the process more efficient.

Zoom Linkhttps://lums-edu-pk.zoom.us/j/91089078817?pwd=ODg2MmlUOGg5eDZ5RlpTUnEzRkt6QT09

Meeting ID: 910 8907 8817

Password: 031732