Event date:
Dec 24 2020 4:00 pm

On Addressing System Heterogeneity Challenges in Federated Learning

Supervisor
Dr. Ihsan Ayyub Qazi
Dr. Zafar Ayyub Qazi
Student
Muhammad Mustansar Saeed
Venue
Zoom Meetings (Online)
Event
MS Thesis defense
Abstract
Deep learning models are increasingly being used in many domains and big data plays a central role in their effectiveness. Applications such as speech recognition, self-driving cars, mobile keyboard prediction (e.g., next word predictions in Gboard), recommender systems (e.g., showing products to the users based on their interests) require large corpus of data. Oftentimes, useful data is private, and it is not practical to transmit and store all data in a centralized location due to strict data privacy regulations. An alternative environment for distributed ML is Federated learning (FL). FL is a distributed machine learning paradigm that involves the training of machine learning algorithms across distributed edge / mobile devices that hold data instead of exchanging data samples.
FL differs from the distributed machine learning setting; clients in FL are resource-constrained in terms of computational power, memory, and network bandwidth. Training throughput is bounded by slow clients with low computational power and/or slower network communication which is named as straggler effect. In traditional FL algorithms, stragglers are dropped out for efficiency purpose; this approach has negative impact on accuracy and becomes biased towards fast clients. In this thesis we aim to quantify how devices with resource and data heterogeneity impact the performance of FL in terms of training time and model accuracy. Moreover, contrary to the approach of dropping the stragglers, we propose a new approach of adaptive model serving based on client’s resource capabilities to reduce the straggler’s effect. We will evaluate this dynamic model serving approach on a real testbed environment.

Deep learning models are increasingly being used in many domains and big data plays a central role in their effectiveness. Applications such as speech recognition, self-driving cars, mobile keyboard prediction (e.g., next word predictions in Gboard), recommender systems (e.g., showing products to the users based on their interests) require large corpus of data. Oftentimes, useful data is private, and it is not practical to transmit and store all data in a centralized location due to strict data privacy regulationsAn alternative environment for distributed ML is Federated learning (FL). FL is a distributed machine learning paradigm that involves the training of machine learning algorithms across distributed edge / mobile devices that hold data instead of exchanging data samples.  

FL differs from the distributed machine learning setting; clients in FL are resource-constrained in terms of computational power, memory, and network bandwidthTraining throughput is bounded by slow clients with low computational power and/or slower network communication which is named as straggler effect. In traditional FL algorithms, stragglers are dropped out for efficiency purpose; this approach has negative impact on accuracy and becomes biased towards fast clients. In this thesis we aim to quantify how devices with resource and data heterogeneity impact the performance of FL in terms of training time and model accuracy. Moreover, contrary to the approach of dropping the stragglers, we propose a new approach of adaptive model serving based on client’s resource capabilities to reduce the straggler’s effect. We will evaluate this dynamic model serving approach on a real testbed environment.