Masoud Jami, born Gholami, successfully defended his dissertation "Optimizing Checkpoint/Restart and Input/Output for Large Scale Applications" at the Institute of Computer Science at Humboldt Universität zu Berlin on October 10, 2024. He developed novel techniques and approaches for checkpoint/restart (C/R) management within the context of High-Performance Computing. For multilevel C/R he achieves more than 10% less C/R overhead compared to state-of-the-art approaches. By combining XOR and partner checkpointing in a scalable manner, he outperforms Reed-Solomon codes in terms of resiliency and computational overhead. For multiple applications performing checkpoints on a supercomputer, he also reduced the implied C/R overhead for applications by more than 10%. His IOSIG plug-in for GCC allows pragma annotations to the source code expressing the input/output (I/O) characteristics for certain I/O streams and then decides during execution to which devices the input/output should best be redirected to. In addition, he developed accurate I/O models of the Linux kernel that take the page caching behaviour into account and thereby allow to estimate I/O costs with more than 80−90% accuracy. The research took place in the SPPEXA FFMK and the CRC 1404 FONDA at ZIB.

Congratulations!