Event date:
Apr 18 2022 11:30 am

Flexible High-Performance Deduplication framework for Docker

Speaker(s)
Dr. Ali R. Butt
Venue
CS Smart Lab
Abstract
Part I:

The rise of containers has led to a broad proliferation of container images. The associated storage performance and capacity requirements place high pressure on the infrastructure of container registries that store and serve images. Exploiting the high file redundancy in real-world container images is a promising approach to drastically reduce the demanding storage requirements of the growing registries. However, existing deduplication techniques significantly degrade the performance of registries because of the high layer restore overhead. In this talk, I will present DupHunter, a new Docker registry architecture, which not only natively deduplicates layers for space savings but also reduces layer restore overhead. I will talk about research challenges faced in designing such a system, and present how the approach compares to the state of the art.


Part II:

In the second part of my talk, I will focus on student interactions. I hope to engage the audience in a discussion about designing scalable high performance computing systems, and research skills that are needed to be a successful independent researcher. Website: https://people.cs.vt.edu/butta/

Dr. Ali R. Butt is a Professor of Computer Science (and ECE by courtesy) and Associate Department Head for Faculty Development in CS@Virginia Tech. He is an ACM Distinguished Member. He received his Ph.D. degree in Electrical and Computer Engineering from Purdue University in 2006, and B.Sc. in Electrical Engineering from UET Lahore in 2000. He is a recipient of a number of awards such as the NSF CAREER Award and IBM Faculty Awards. He has served as the Associate Editor for IEEE Transactions on Cloud Computing, ACM Transactions on Storage, and IEEE Transactions on Parallel and Distributed Systems,  He is an alumni of the National Academy of Engineering's US Frontiers of Engineering (FOE) Symposium and National Academy of Science's AA Symposium on Sensor Science. Ali's research interests are in: cloud and high-performance computing systems; systems support for machine and deep learning applications; file, I/O, and storage systems; distributed systems; and large-scale experimental computer systems. At Virginia Tech he leads the Distributed Systems & Storage Laboratory (DSSL).