Ruiming Lu
I am a final-year PhD student in Computer Science from >Shanghai Jiao Tong
University, advised by Professor Guangtao Xue and
Professor Minglu
Li. I obtained my B.S. degree in Electrical and Computer
Engineering (ECE) in 2020 from UM-SJTU Joint Institute.
In 2023-2024, I spent a wonderful year as a visiting PhD student at OrderLab, University of
Michigan, Ann Arbor, hosted by Prof. Ryan Huang. I am now working with Jilong Xue at Microsoft Research Asia, addressing reliability issues in LLM training systems.
Email /
CV /
Google Scholar /
Github
|
|
Research
My current research focuses on the reliability aspect of
modern data centers. I work towards analyzing the failure
characteristics of massively-deployed storage devices (e.g.,
NVMe SSD), understanding novel failure modes (e.g., the
fail-slow failure), and designing practical fault-tolerant
systems.
|
One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems
Ruiming Lu,
Yunchi Lu,
Yuxuan Jiang,
Guangtao Xue,
Peng Huang
NSDI 2025
[Preprint]
[Software]
|
Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems
Ruiming Lu*,
Erci Xu*,
Yiming Zhang,
Fengyi Zhu, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Jiwu Shu, Minglu Li, Jiesheng Wu (*Co-first)
FAST 2023   (Best Paper Award, Inivited to Appear in USENIX ;login:, Fast-tracked to ToS)
[PDF]
[Slides]
[Video]
[Dataset]
Press
[AliCloud]
[CitiNews]
|
NVMe SSD Failures in the Field: the Fail-Stop and the Fail-Slow
Ruiming Lu*,
Erci Xu*,
Yiming Zhang,
Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Minglu Li, Jiesheng Wu (*Co-first)
ATC 2022
[PDF]
[Slides]
[Video]
[Dataset]
Press
[ChinaSys]
[Shanghai Computer Association - Storage]
|
- Artifact Evaluation Committee, FAST 2024 2025
- Artifact Evaluation Committee, EuroSys 2024
- Artifact Evaluation Committee, SOSP 2023
|
|