About me
Assistant professor, PhD advisor
The School of Computer Science,
Peking University,
Beijing, China
I am currently engaged in the research and design of storage systems and specialized processors. My research addresses the requirements for high-performance storage systems in the era of big data and artificial intelligence from the perspective of computer architecture. I am dedicated to breaking through the bottlenecks of data migration and the limitations of memory walls in the Von Neumann architecture.
Important:
I am actively seeking talented and self-motivated students. There are two openings per year for future PhD candidates and multiple positions for interns. It’s always welcome to contact me via email.
News:
- November 2024: Three papers are accepted to HPCA’25. Congratulations to Xiurui and Endian!
- October 2024: Invited to serve as PC of USENIX ATC, ISCA and ISPASS.
- August 2024: BIZA is accepted to SOSP’24. Congratulations to Shushu, Shaocong and Li Peng!
- July 2024: Two papers are accepted to MICRO’24.
- July 2024: Two papers are accepted to TC and ToS.
- May 2024: Invited to serve as PC of HPCA.
- May 2024: Two papers are accepted to USENIX ATC’24. Congratulations to Shushu Yi, Li Peng, Xiurui and Yuda!
- April 2024: One paper is accepted to TACO.
- April 2024: Flagger is accepted to ISCA’24. Congratulations to Xiurui and Yuda!
- March 2024: Invited to serve as ERC of MICRO.
- November 2023: One paper is accepted to ASPLOS’24.
- October 2023: Invited to serve as ERC of ISCA.
- October 2023: Four papers are accepted to HPCA’24. Congratulations to Yuda and Yuyue!
- August 2023: Awarded Intel Young Faculty Researcher Program.
- July 2023: invited to serve as TPC of HPCA.
- April 2023: Awarded 1st prize in national storage technology competition. Congrats to Shushu Yi!
- January 2023: one paper is accepted to NVMW.
- January 2023: one paper is accepted to CAL.
- December 2022: one paper is accepted to SAC.
- October 2022: invited to serve as TPC of USENIX ATC and SAC.
- September 2022: awarded ACM SIGCSE Rising Star!
- July 2022: one paper is accepted to THPC.
- May 2022: our paper “ScalaRAID” is accepted to HotStorage’22. Congrats to Shushu Yi!
- April 2022: two papers are accepted to NVMW’22.
- December 2021: awarded NSFC Excellent Young Scientists Fund Overseas Program (国家自然科学基金优秀青年科学基金海外项目)!
- Sep 2021: our work “HAMS” is selected as KAIST breakthroughs 50 years.
- August 2021: our work “OhmGPU” is reported by Naver headline + 26 and Press.
- July 2021: one paper is accepted to MICRO’21.
- April 2021: three papers are accepted to NVMW’21.
- March 2021: our work “HAMS” is reported by Naver headline + 39, KBS and Press.
- Feb 2021: one paper is accepted to ISCA’21.
- June 2020: one paper is accepted to ISCA’20.
- Feb 2020: one paper is accepted to HPCA’20.
- Feb 2020: one paper is accpeted to FAST’20.
- Feb 2020: join KAIST as a postdoctoral researcher.
- Dec 2019: successfully defend PhD thesis.
Selected Publications:
- (HPCA’25) InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference
- (HPCA’25) Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications
- (HPCA’25) NeuVSA: A Unified and Efficient Accelerator for Neural Vector Search
- (SOSP’24) BIZA: Design of Self-Governing Block-Interface ZNS AFA for Endurance and Performance
- (MICRO’24) FlashLLM: A Chiplet-Based In-Flash Computing Architecture to Enable On-Device Inference of 70B LLM
- (MICRO’24) NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
- (USENIX ATC’24) ScalaCache: Scalable User-Space Page Cache Management with Software-Hardware Coordination
- (USENIX ATC’24) ScalaAFA: Constructing User-Space All-Flash Array Engine with Holistic Designs
- (ISCA’24) Flagger: Cooperative Acceleration for Large-Scale Cross-Silo Federated Learning Aggregation
- (ASPLOS’24) Achieving Near-Zero Read Retry for 3D NAND Flash Memory
- (HPCA’24) BeaconGNN: Large-Scale GNN Acceleration with Asynchronous In-Storage Computing
- (HPCA’24) StreamPIM: Streaming Matrix Computation in Racetrack Memory
- (HPCA’24) LearnedFTL: A Learning-based Page-level FTL for Reducing Double Reads in Flash-based SSDs
- (HPCA’24) Midas Touch: Invalid-Data Assisted Reliability and Performance Boost for 3D High-Density Flash
- (MICRO’21) Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-Processors
- (ISCA’21) Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution
- (ISCA’20) ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data Analysis
- (USENIX FAST’20) Scalable Parallel Flash Firmware for Many-core Architectures
- (HPCA’20) DRAM-less: Hardware Acceleration of Data Processing with New Memory
- (DAC’19) FlashGPU: Placing New Flash Next to GPU Cores
- (HPCA’19) FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads
- (OSDI’18) FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs
- (MICRO’18) Amber: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD Resources
- (Eurosys’18) FlashAbacus: A Self-governing Flash-based Accelerator for Low-power Systems
- (HPCA’16) DUANG: Fast and Lightweight Page Migration in Asymmetric Memory Systems
- (PACT’15) NVMMU: A Non-Volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures
- (HotStorage’14) Power, Energy and Thermal Considerations in SSD-Based I/O Acceleration