Research
My current research focuses on the reliability aspect of
modern data centers. I work towards analyzing the failure
characteristics of massively-deployed storage devices (e.g.,
NVMe SSD), understanding novel failure modes (e.g., the
fail-slow failure), and designing practical fault-tolerant
systems.
|
Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems
Ruiming Lu*,
Erci Xu*,
Yiming Zhang,
Fengyi Zhu, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Jiwu Shu, Minglu Li, Jiesheng Wu (*Co-first)
FAST 2023   (Best Paper Award, Inivited to Appear in USENIX ;login:, Fast-tracked to ToS)
[PDF]
[Slides]
[Video]
[Dataset]
Press  
[AliCloud]
[CitiNews]
|
NVMe SSD Failures in the Field: the Fail-Stop and the Fail-Slow
Ruiming Lu*,
Erci Xu*,
Yiming Zhang,
Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Minglu Li, Jiesheng Wu (*Co-first)
ATC 2022
[PDF]
[Slides]
[Video]
[Dataset]
Press  
[ChinaSys]
[Shanghai Computer Association - Storage]
|
|