Preprints

2025
February
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?.
Zhou, Yang; Liu, Hongyi; Chen, Zhuoming; Tian, Yuandong; Chen, Beidi
Preprints
2024
October
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.
Sun, Hanshi; Chang, Li-Wen; Bao, Wenlei; Zheng, Size; Zheng, Ningxin; Liu, Xin; Dong, Harry; Chi, Yuejie; Chen, Beidi
Preprints
2024
February
LLM Inference Unveiled: Survey and Roofline Model Insights.
Yuan, Zhihang; Shang, Yuzhang; Zhou, Yang; Dong, Zhen; Xue, Chenhao; Wu, Bingzhe; Li, Zhikai; Gu, Qingyi; Lee, Yong Jae; Yan, Yan; Chen, Beidi; Sun, Guangyu; Keutzer, Kurt
Preprints
2023
June
Inrank: Incremental Low-Rank Learning.
Zhao, Jiawei; Zhang, Yifei; Chen, Beidi; Schäfer, Florian; Anandkumar, Anima
Preprints
2023
January
Sample-Efficient Surrogate Model for Frequency Response of Linear PDEs using Self-Attentive Complex Polynomials.
Cohen, Andrew; Dou, Weiping; Zhu, Jiang; Koziel, Slawomir; Renner, Peter; Mattsson, Jan-Ove; Yang, Xiaomeng; Chen, Beidi; Stone, Kevin; Tian, Yuandong
Preprints

Publications

2025
May
MagicPIG: LSH Sampling for Efficient LLM Generation.
Chen, Zhuoming; Sadhukhan, Ranajoy; Ye, Zihao; Zhou, Yang; Zhang, Jianyu; Nolte, Niklas; Tian, Yuandong; Douze, Matthijs; Bottou, Leon; Jia, Zhihao; Chen, Beidi
ICLR 2025 (Spotlight).
2025
May
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.
Chen, Jian*; Tiwari, Vashisth*; Sadhukhan, Ranajoy*; Chen, Zhuoming; Shi, Jinyuan; Yen, Ian En-Hsu; Chen, Beidi
ICLR 2025.
2025
May
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding.
Yang, Xinyu; Chen, Tianqi; Chen, Beidi
ICLR 2025.
2025
May
Memory Mosaics.
Zhang, Jianyu; Niklas, Nolte; Sadhukhan, Ranajoy; Chen, Beidi; Bottou, Léon
ICLR 2025.
2025
May
Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity.
Guo, Wentao; Long, Jikai; Zeng, Yimeng; Liu, Zirui; Yang, Xinyu; Ran, Yide; Gardner, Jacob R; Bastani, Osbert; De Sa, Christopher; Yu, Xiaodong; Chen, Beidi; Xu, Zhaozhuo
ICLR 2025.
2024
December
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding.
Chen, Zhuoming*; May, Avner*; Svirschevski, Ruslan*; Huang, Yuhsun; Ryabinin, Max; Jia, Zhihao; Chen, Beidi
NeurIPS 2024 (Spotlight).
2024
December
Sirius: Contextual Sparsity with Correction for Efficient LLMs.
Zhou, Yang; Chen, Zhuoming; Xu, Zhaozhuo; Lin, Victoria; Chen, Beidi
NeurIPS 2024.
2024
December
S2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity.
Yang, Xinyu; Leng, Jixuan; Guo, Geyang; Zhao, Jiawei; Nakada, Ryumei; Zhang, Linjun; Yao, Huaxiu; Chen, Beidi
NeurIPS 2024.
2024
December
Learn To be Efficient: Build Structured Sparsity in Large Language Models.
Zheng, Haizhong; Bai, Xiaoyan; Chen, Beidi; Lai, Fan; Prakash, Atul
NeurIPS 2024 (Spotlight).
2024
December
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices.
Svirschevski, Ruslan*; May, Avner*; Chen, Zhuoming*; Chen, Beidi; Jia, Zhihao; Ryabinin, Max
NeurIPS 2024.
2024
December
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.
Ma, Xuezhe; Yang, Xiaomeng; Xiong, Wenhan; Chen, Beidi; Yu, Lili; Zhang, Hao; May, Jonathan; Zettlemoyer, Luke; Levy, Omer; Zhou, Chunting
NeurIPS 2024.
2024
December
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training.
Luo, Cheng; Zhao, Jiawei; Chen, Zhuoming; Chen, Beidi; Anandkumar, Anima
NeurIPS 2024.
2024
December
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution.
Li, Minghan; Chen, Xilun; Holtzman, Ari; Chen, Beidi; Lin, Jimmy; Yih, Wen-tau; Lin, Xi Victoria
NeurIPS 2024.
2024
December
Who Needs Features? On the Surprising Effectiveness of Attention Transfer for Vision Transformers.
Li, Alexander, Cong; Tian, Yuandong; Chen, Beidi; Pathak, Deepak; Chen, Xinlei
NeurIPS 2024.
2024
December
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
Zhang, Zhenyu; Chen, Runjin; Liu, Shiwei; Yao, Zhewei; Ruwase, Olatunji; Chen, Beidi; Wu, Xiaoxia; Wang, Zhangyang
NeurIPS 2024.
2024
September
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding.
Sun, Hanshi; Chen, Zhuoming; Yang, Xinyu; Tian, Yuandong; Chen, Beidi
COLM 2024
Web Website PDF PDF Code Code
2024
September
Prompt-Prompted Mixture of Experts for Efficient LLM Generation.
Dong, Harry; Chen, Beidi; Chi, Yuejie
COLM 2024.
2024
July
Galore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.
Zhao, Jiawei; Zhang, Zhenyu; Chen, Beidi; Wang, Zhangyang; Anandkumar, Anima; Tian, Yuandong
ICML 2024 (Oral).
2024
July
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.
Dong, Harry; Yang, Xinyu; Zhang, Zhenyu; Wang, Zhangyang; Chi, Yuejie; Chen, Beidi
ICML 2024.
2024
July
LoCoCo: Dropping In Convolutions for Long Context Compression.
Cai, Ruisi; Tian, Yuandong; Wang, Zhangyang; Chen, Beidi
ICML 2024.
2024
July
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment.
Jiang, Youhe; Yan, Ran; Yao, Xiaozhe; Zhou, Yang; Chen, Beidi; Yuan, Binhang
ICML 2024.
2024
July
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache.
Liu, Zirui; Yuan, Jiayi; Jin, Hongye; Zhong, Shaochen; Xu, Zhaozhuo; Braverman, Vladimir; Chen, Beidi; Hu, Xia
ICML 2024.
2024
July
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-Off of LLM Inference with Transferable Prompt.
Xu, Zhaozhuo; Liu, Zirui; Chen, Beidi; Tang, Yuxin; Wang, Jue; Zhou, Kaixiong; Hu, Xia; Shrivastava, Anshumali
ICML 2024.
2024
July
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding.
Elhoushi, Mostafa; Shrivastava, Akshat; Liskovich, Diana; Hosmer, Basil; Wasti, Bram; Lai, Liangzhen; Mahmoud, Anas; Acun, Bilge; Agarwal, Saurabh; Roman, Ahmed
ACL 2024.
2024
July
Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache.
Zhang, Zhenyu; Liu, Shiwei; Chen, Runjin; Kailkhura, Bhavya; Chen, Beidi; Wang, Atlas
MLSys 2024.
2024
May
Efficient Streaming Language Models with Attention Sinks.
Xiao, Guangxuan; Tian, Yuandong; Chen, Beidi; Han, Song; Lewis, Mike
ICLR 2024.
2024
May
Joma: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention.
Tian, Yuandong; Wang, Yiping; Zhang, Zhenyu; Chen, Beidi; Du, Simon
ICLR 2024.
2023
July
Fast Algorithms for a New Relaxation of Optimal Transport.
Charikar, Moses; Chen, Beidi; Ré, Christopher; Waingarten, Erik
COLT 2023.
2023
December
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Zhang, Zhenyu; Sheng, Ying; Zhou, Tianyi; Chen, Tianlong; Zheng, Lianmin; Cai, Ruisi; Song, Zhao; Tian, Yuandong; Ré, Christopher; Barrett, Clark; Wang, Zhangyang; Chen, Beidi
NeurIPS 2023
2023
December
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer.
Tian, Yuandong; Wang, Yiping; Chen, Beidi; Du, Simon
NeurIPS 2023
2023
December
Laughing Hyena Distillery: Extracting Compact Recurrences from Convolutions.
Massaroli, Stefano; Poli, Michael; Fu, Dan; Kumbong, Hermann; Parnichkun, Rom; Romero, David; Timalsina, Aman; McIntyre, Quinn; Chen, Beidi; Rudra, Atri
NeurIPS 2023
2023
July
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time.
Liu, Zichang; Wang, Jue; Dao, Tri; Zhou, Tianyi; Yuan, Binhang; Song, Zhao; Shrivastava, Anshumali; Zhang, Ce; Tian, Yuandong; Re, Christopher; Chen, Beidi
ICML 2023 (Oral)
2023
July
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Sheng, Ying; Zheng, Lianmin; Yuan, Binhang; Li, Zhuohan; Ryabinin, Max; Chen, Beidi; Liang, Percy; Re, Christopher; Stoica, Ion; Zhang, Ce
ICML 2023 (Oral)
2023
July
CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks.
Wang, Jue; Lu, Yucheng; Yuan, Binhang; Chen, Beidi; Liang, Percy; De Sa, Christopher; Re, Christopher; Zhang, Ce
ICML 2023
2022
December
Decentralized Training of Foundation Models in Heterogeneous Environments.
Yuan, Binhang; He, Yongjun; Davis, Jared Quincy; Zhang, Tianyi; Dao, Tri; Chen, Beidi; Liang, Percy; Re, Christopher; Zhang, Ce
NeurIPS 2022 (Oral)
2022
December
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees.
Wang, Jue; Yuan, Binhang; Rimanic, Luka; He, Yongjun; Dao, Tri; Chen, Beidi; Re, Christopher; Zhang, Ce
NeurIPS 2022
2022
July
Monarch: Expressive Structured Matrices for Efficient and Accurate Training.
Dao, Tri; Chen, Beidi; Sohoni, Nimit S; Desai, Arjun; Poli, Michael; Grogan, Jessica; Liu, Alexander; Rao, Aniruddh; Rudra, Atri; Ré, Christopher
ICML 2022 (Outstanding Paper Runner Up)
2022
May
Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models.
Chen, Beidi; Dao, Tri; Liang, Kaizhao; Yang, Jiaming; Song, Zhao; Rudra, Atri; Re, Christopher
ICLR 2022 (Spotlight)
2022
April
Halos: Hashing Large Output Space for Cheap Inference.
Liu, Zichang; Xu, Zhaozhuo; Ji, Alan; Zhang, Junyan; Li, Jonathan; Chen, Beidi; Shrivastava, Anshumali
MLSys 2022
2021
December
Scatterbrain: Unifying Sparse and Low-Rank Attention.
Chen, Beidi; Dao, Tri; Winsor, Eric; Song, Zhao; Rudra, Atri; Ré, Christopher
NeurIPS 2021
2021
December
Locality Sensitive Teaching.
Xu, Zhaozhuo; Chen, Beidi; Li, Chaojian; Liu, Weiyang; Song, Le; Lin, Yingyan; Shrivastava, Anshumali
NeurIPS 2021
2021
July
A Tale of Two Efficient and Informative Negative Sampling Distributions.
Daghaghi, Shabnam; Medini, Tharun; Meisburger, Nicholas; Chen, Beidi; Zhao, Mengnan; Shrivastava, Anshumali
ICML 2021 (Oral)
2021
May
MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training.
Chen, Beidi; Liu, Zichang; Peng, Binghui; Xu, Zhaozhuo; Li, Jonathan Lingjie; Dao, Tri; Song, Zhao; Shrivastava, Anshumali; Re, Christopher
ICLR 2021 (Oral)
2021
May
SOLAR: Sparse Orthogonal Learned and Random Embeddings.
Medini, Tharun; Chen, Beidi; Shrivastava, Anshumali
ICLR 2021
2021
March
Satellite Images and Deep Learning to Identify Discrepancy in Mailing Addresses with Applications to Census 2020 in Houston.
Xu, Zhaozhuo; Ji, Alan Baonan; Woods, Andrew; Chen, Beidi; Shrivastava, Anshumali
JSM 2021
2020
December
SLIDE: In Defense of Smart Algorithms over Hardware Acceleration for Large-scale Deep Learning Systems.
Chen, Beidi; Medini, Tharun; Farwell, James; Gobriel, Sameh; Tai, Charlie; Shrivastava, Anshumali
MLSys 2020
2020
July
Angular Visual Hardness.
Chen, Beidi; Liu, Weiyang; Yu, Zhiding; Kautz, Jan; Shrivastava, Anshumali; Garg, Animesh; Anandkumar, Animashree
ICML 2020
2019
December
Fast and Accurate Stochastic Gradient Estimation.
Chen, Beidi; Xu, Yingchen; Shrivastava, Anshumali
NeurIPS 2019
2018
July
Densified Winner Take All (WTA) Hashing for Sparse Datasets.
Chen, Beidi; Shrivastava, Anshumali
UAI 2018
2018
June
Unique Entity Estimation with Application to the Syrian Conflict.
Chen, Beidi; Shrivastava, Anshumali; Steorts, Rebecca C
The Annals of Applied Statistics (IISA 2018 Best Student Paper in Applied Statistics)
2014
November
Analyzing Log Analysis: An Empirical Study of User Log Mining.
Alspaugh, Sara; Chen, Beidi; Lin, Jessica; Ganapathi, Archana; Hearst, Marti; Katz, Randy
LISA 2014 (Best Student Paper)