Ruiming Lu

I am a final-year PhD student in Computer Science from Shanghai Jiao Tong University, advised by Professor Guangtao Xue and Professor Minglu Li. I obtained my B.S. degree in Electrical and Computer Engineering (ECE) in 2020 from the University of Michigan - Shanghai Jiao Tong University Joint Institute.

In 2023-2024, I spent a wonderful year as a visiting PhD student at OrderLab, University of Michigan, Ann Arbor, hosted by Prof. Ryan Huang. I am now working with Jilong Xue at Microsoft Research Asia, addressing reliability issues in LLM training systems.

Email  /  CV  /  Google Scholar  /  Github

profile photo

Research

My current research focuses on the reliability aspect of modern data centers. I work towards analyzing the failure characteristics of massively-deployed storage devices (e.g., NVMe SSD), understanding novel failure modes (e.g., the fail-slow failure), and designing practical fault-tolerant systems.

Selected Publications (See full publication list)

Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems
Ruiming Lu*, Erci Xu*, Yiming Zhang, Fengyi Zhu, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Jiwu Shu, Minglu Li, Jiesheng Wu (*Co-first)
FAST 2023   (Best Paper Award, Inivited to Appear in USENIX ;login:, Fast-tracked to ToS)
[PDF] [Slides] [Video] [Dataset]
Press   [AliCloud] [CitiNews]

NVMe SSD Failures in the Field: the Fail-Stop and the Fail-Slow
Ruiming Lu*, Erci Xu*, Yiming Zhang, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Minglu Li, Jiesheng Wu (*Co-first)
ATC 2022
[PDF] [Slides] [Video] [Dataset]
Press   [ChinaSys] [Shanghai Computer Association - Storage]

Professional Service


Template credits to jonbarron. Last modified: Nov 6th, 2024.