I am a CS Ph.D. candidate at University of Illinois at Urbana-Champaign, advised by Prof. Marc Snir and Prof. Jian Huang. My main research interests include storage systems and ML for systems. More specifically, I am working towards building learning-based storage systems, from application-level indexes to storage devices like SSDs. Prior to my Ph.D. studies, I obtained my master’s degree from UIUC in 2020 and bachelor’s degree from Beihang University in 2018.
LeaFTL: A Learning-Based Flash Translation Layer [ASPLOS '23] We built a flash translation layer for SSDs with learned indexes, which compresses the logical-to-physical address mapping table by learning the dynamic data access patterns on the SSD. It significantly reduces the memory footprint by 2.9x to save the DRAM cost on the SSD and it could further benefit SSD performance with a larger data cache. We also presented our work at the WDC 2030 AI Focus seminar. |
BlockFlex: A Learning-Based Storage Harvesting Framework [OSDI '22] Our study of cloud storage traces shows that cloud storage is significantly underutilized. We developed BlockFlex, which can predict the storage requirements of VMs with high accuracy and harvest idle flash-based storage resources to improve cloud storage utilization. We showed that BlockFlex could improve the storage utilization in data centers by 1.25x and the storage performance of harvest VMs by 22-60%. |
Cloud ventors like Microsoft Azure are building data centers fully powered by renewable energy. To tackle the availability issues introduced by the variability of renewable energy production, we developed a scheduling framework that groups together data centers with complementary energy supply patterns and schedules workloads between them with minimal migration overhead.
Performance variability issues can significantly impact supercomputer performance. However, it is challenging to detect them with high efficiency and low overhead at scale. We built a lightweight performance variability analysis tool with high detection coverage. We evaluated it on the Tianhe-2 Supercomputer with more than 16,000 parallel processes. We showed that our tool could detect real-world performance variability issues in the supercomputer with minimal performance interference to existing programs.
I worked on the development of the quota management of the open-source cloud platform OpenStack.
Thesis Title: Towards Learning-Based Storage Systems – A Holistic Approach
Thesis Title: Detecting and Understanding Crash-Consistency Bugs across the Parallel I/O Stack
Outstanding Graduation Award