About me

Assistant professor, PhD advisor
The School of Computer Science,
Peking University,
Beijing, China

I am currently engaged in the research and design of storage systems and specialized processors. My research addresses the requirements for high-performance storage systems in the era of big data and artificial intelligence from the perspective of computer architecture. I am dedicated to breaking through the bottlenecks of data migration and the limitations of memory walls in the Von Neumann architecture.


Important:

I am actively seeking talented and self-motivated students. There are two openings per year for future PhD candidates and multiple positions for interns. It’s always welcome to contact me via email.


News:

  • November 2024: Three papers are accepted to HPCA’25. Congratulations to Xiurui and Endian!
  • October 2024: Invited to serve as PC of USENIX ATC, ISCA and ISPASS.
  • August 2024: BIZA is accepted to SOSP’24. Congratulations to Shushu, Shaocong and Li Peng!
  • July 2024: Two papers are accepted to MICRO’24.
  • July 2024: Two papers are accepted to TC and ToS.
  • May 2024: Invited to serve as PC of HPCA.
  • May 2024: Two papers are accepted to USENIX ATC’24. Congratulations to Shushu Yi, Li Peng, Xiurui and Yuda!
  • April 2024: One paper is accepted to TACO.
  • April 2024: Flagger is accepted to ISCA’24. Congratulations to Xiurui and Yuda!
  • March 2024: Invited to serve as ERC of MICRO.
  • November 2023: One paper is accepted to ASPLOS’24.
  • October 2023: Invited to serve as ERC of ISCA.
  • October 2023: Four papers are accepted to HPCA’24. Congratulations to Yuda and Yuyue!
  • August 2023: Awarded Intel Young Faculty Researcher Program.
  • July 2023: invited to serve as TPC of HPCA.
  • April 2023: Awarded 1st prize in national storage technology competition. Congrats to Shushu Yi!
  • January 2023: one paper is accepted to NVMW.
  • January 2023: one paper is accepted to CAL.
  • December 2022: one paper is accepted to SAC.
  • October 2022: invited to serve as TPC of USENIX ATC and SAC.
  • September 2022: awarded ACM SIGCSE Rising Star!
  • July 2022: one paper is accepted to THPC.
  • May 2022: our paper “ScalaRAID” is accepted to HotStorage’22. Congrats to Shushu Yi!
  • April 2022: two papers are accepted to NVMW’22.
  • December 2021: awarded NSFC Excellent Young Scientists Fund Overseas Program (国家自然科学基金优秀青年科学基金海外项目)!
  • Sep 2021: our work “HAMS” is selected as KAIST breakthroughs 50 years.
  • August 2021: our work “OhmGPU” is reported by Naver headline + 26 and Press.
  • July 2021: one paper is accepted to MICRO’21.
  • April 2021: three papers are accepted to NVMW’21.
  • March 2021: our work “HAMS” is reported by Naver headline + 39, KBS and Press.
  • Feb 2021: one paper is accepted to ISCA’21.
  • June 2020: one paper is accepted to ISCA’20.
  • Feb 2020: one paper is accepted to HPCA’20.
  • Feb 2020: one paper is accpeted to FAST’20.
  • Feb 2020: join KAIST as a postdoctoral researcher.
  • Dec 2019: successfully defend PhD thesis.

Selected Publications:

  • (HPCA’25) InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference
  • (HPCA’25) Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications
  • (HPCA’25) NeuVSA: A Unified and Efficient Accelerator for Neural Vector Search
  • (SOSP’24) BIZA: Design of Self-Governing Block-Interface ZNS AFA for Endurance and Performance
  • (MICRO’24) FlashLLM: A Chiplet-Based In-Flash Computing Architecture to Enable On-Device Inference of 70B LLM
  • (MICRO’24) NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
  • (USENIX ATC’24) ScalaCache: Scalable User-Space Page Cache Management with Software-Hardware Coordination
  • (USENIX ATC’24) ScalaAFA: Constructing User-Space All-Flash Array Engine with Holistic Designs
  • (ISCA’24) Flagger: Cooperative Acceleration for Large-Scale Cross-Silo Federated Learning Aggregation
  • (ASPLOS’24) Achieving Near-Zero Read Retry for 3D NAND Flash Memory
  • (HPCA’24) BeaconGNN: Large-Scale GNN Acceleration with Asynchronous In-Storage Computing
  • (HPCA’24) StreamPIM: Streaming Matrix Computation in Racetrack Memory
  • (HPCA’24) LearnedFTL: A Learning-based Page-level FTL for Reducing Double Reads in Flash-based SSDs
  • (HPCA’24) Midas Touch: Invalid-Data Assisted Reliability and Performance Boost for 3D High-Density Flash
  • (MICRO’21) Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-Processors
  • (ISCA’21) Revamping Storage Class Memory With Hardware Automated Memory-Over-Storage Solution
  • (ISCA’20) ZnG: Architecting GPU Multi-Processors with New Flash for Scalable Data Analysis
  • (USENIX FAST’20) Scalable Parallel Flash Firmware for Many-core Architectures
  • (HPCA’20) DRAM-less: Hardware Acceleration of Data Processing with New Memory
  • (DAC’19) FlashGPU: Placing New Flash Next to GPU Cores
  • (HPCA’19) FUSE: Fusing STT-MRAM into GPUs to Alleviate Off-Chip Memory Access Overheads
  • (OSDI’18) FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs
  • (MICRO’18) Amber: Enabling Precise Full-System Simulation with Detailed Modeling of All SSD Resources
  • (Eurosys’18) FlashAbacus: A Self-governing Flash-based Accelerator for Low-power Systems
  • (HPCA’16) DUANG: Fast and Lightweight Page Migration in Asymmetric Memory Systems
  • (PACT’15) NVMMU: A Non-Volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures
  • (HotStorage’14) Power, Energy and Thermal Considerations in SSD-Based I/O Acceleration