2025
May
|
MagicPIG: LSH Sampling for Efficient LLM Generation.
Chen, Zhuoming; Sadhukhan, Ranajoy; Ye, Zihao; Zhou, Yang; Zhang, Jianyu; Nolte, Niklas; Tian, Yuandong; Douze, Matthijs; Bottou, Leon; Jia, Zhihao; Chen, Beidi
ICLR 2025
(Spotlight).
|
2025
May
|
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.
Chen, Jian*; Tiwari, Vashisth*; Sadhukhan, Ranajoy*; Chen, Zhuoming; Shi, Jinyuan; Yen, Ian En-Hsu; Chen, Beidi
ICLR 2025.
|
2025
May
|
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding.
Yang, Xinyu; Chen, Tianqi; Chen, Beidi
ICLR 2025.
|
2025
May
|
Memory Mosaics.
Zhang, Jianyu; Niklas, Nolte; Sadhukhan, Ranajoy; Chen, Beidi; Bottou, Léon
ICLR 2025.
|
2025
May
|
Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity.
Guo, Wentao; Long, Jikai; Zeng, Yimeng; Liu, Zirui; Yang, Xinyu; Ran, Yide; Gardner, Jacob R; Bastani, Osbert; De Sa, Christopher; Yu, Xiaodong; Chen, Beidi; Xu, Zhaozhuo
ICLR 2025.
|
2024
December
|
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding.
Chen, Zhuoming*; May, Avner*; Svirschevski, Ruslan*; Huang, Yuhsun; Ryabinin, Max; Jia, Zhihao; Chen, Beidi
NeurIPS 2024
(Spotlight).
|
2024
December
|
Sirius: Contextual Sparsity with Correction for Efficient LLMs.
Zhou, Yang; Chen, Zhuoming; Xu, Zhaozhuo; Lin, Victoria; Chen, Beidi
NeurIPS 2024.
|
2024
December
|
S2FT: Efficient, Scalable and Generalizable LLM Fine-tuning by Structured Sparsity.
Yang, Xinyu; Leng, Jixuan; Guo, Geyang; Zhao, Jiawei; Nakada, Ryumei; Zhang, Linjun; Yao, Huaxiu; Chen, Beidi
NeurIPS 2024.
|
2024
December
|
Learn To be Efficient: Build Structured Sparsity in Large Language Models.
Zheng, Haizhong; Bai, Xiaoyan; Chen, Beidi; Lai, Fan; Prakash, Atul
NeurIPS 2024
(Spotlight).
|
2024
December
|
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices.
Svirschevski, Ruslan*; May, Avner*; Chen, Zhuoming*; Chen, Beidi; Jia, Zhihao; Ryabinin, Max
NeurIPS 2024.
|
2024
December
|
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.
Ma, Xuezhe; Yang, Xiaomeng; Xiong, Wenhan; Chen, Beidi; Yu, Lili; Zhang, Hao; May, Jonathan; Zettlemoyer, Luke; Levy, Omer; Zhou, Chunting
NeurIPS 2024.
|
2024
December
|
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Training.
Luo, Cheng; Zhao, Jiawei; Chen, Zhuoming; Chen, Beidi; Anandkumar, Anima
NeurIPS 2024.
|
2024
December
|
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution.
Li, Minghan; Chen, Xilun; Holtzman, Ari; Chen, Beidi; Lin, Jimmy; Yih, Wen-tau; Lin, Xi Victoria
NeurIPS 2024.
|
2024
December
|
Who Needs Features? On the Surprising Effectiveness of Attention Transfer for Vision Transformers.
Li, Alexander, Cong; Tian, Yuandong; Chen, Beidi; Pathak, Deepak; Chen, Xinlei
NeurIPS 2024.
|
2024
December
|
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
Zhang, Zhenyu; Chen, Runjin; Liu, Shiwei; Yao, Zhewei; Ruwase, Olatunji; Chen, Beidi; Wu, Xiaoxia; Wang, Zhangyang
NeurIPS 2024.
|
2024
September
|
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding.
Sun, Hanshi; Chen, Zhuoming; Yang, Xinyu; Tian, Yuandong; Chen, Beidi
COLM 2024
Website
PDF
Code
|
2024
September
|
Prompt-Prompted Mixture of Experts for Efficient LLM Generation.
Dong, Harry; Chen, Beidi; Chi, Yuejie
COLM 2024.
|
2024
July
|
Galore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.
Zhao, Jiawei; Zhang, Zhenyu; Chen, Beidi; Wang, Zhangyang; Anandkumar, Anima; Tian, Yuandong
ICML 2024
(Oral).
|
2024
July
|
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.
Dong, Harry; Yang, Xinyu; Zhang, Zhenyu; Wang, Zhangyang; Chi, Yuejie; Chen, Beidi
ICML 2024.
|
2024
July
|
LoCoCo: Dropping In Convolutions for Long Context Compression.
Cai, Ruisi; Tian, Yuandong; Wang, Zhangyang; Chen, Beidi
ICML 2024.
|
2024
July
|
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment.
Jiang, Youhe; Yan, Ran; Yao, Xiaozhe; Zhou, Yang; Chen, Beidi; Yuan, Binhang
ICML 2024.
|
2024
July
|
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache.
Liu, Zirui; Yuan, Jiayi; Jin, Hongye; Zhong, Shaochen; Xu, Zhaozhuo; Braverman, Vladimir; Chen, Beidi; Hu, Xia
ICML 2024.
|
2024
July
|
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-Off of LLM Inference with Transferable Prompt.
Xu, Zhaozhuo; Liu, Zirui; Chen, Beidi; Tang, Yuxin; Wang, Jue; Zhou, Kaixiong; Hu, Xia; Shrivastava, Anshumali
ICML 2024.
|
2024
July
|
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding.
Elhoushi, Mostafa; Shrivastava, Akshat; Liskovich, Diana; Hosmer, Basil; Wasti, Bram; Lai, Liangzhen; Mahmoud, Anas; Acun, Bilge; Agarwal, Saurabh; Roman, Ahmed
ACL 2024.
|
2024
July
|
Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache.
Zhang, Zhenyu; Liu, Shiwei; Chen, Runjin; Kailkhura, Bhavya; Chen, Beidi; Wang, Atlas
MLSys 2024.
|
2024
May
|
Efficient Streaming Language Models with Attention Sinks.
Xiao, Guangxuan; Tian, Yuandong; Chen, Beidi; Han, Song; Lewis, Mike
ICLR 2024.
|
2024
May
|
Joma: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention.
Tian, Yuandong; Wang, Yiping; Zhang, Zhenyu; Chen, Beidi; Du, Simon
ICLR 2024.
|
2023
July
|
Fast Algorithms for a New Relaxation of Optimal Transport.
Charikar, Moses; Chen, Beidi; Ré, Christopher; Waingarten, Erik
COLT 2023.
|
2023
December
|
H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Zhang, Zhenyu; Sheng, Ying; Zhou, Tianyi; Chen, Tianlong; Zheng, Lianmin; Cai, Ruisi; Song, Zhao; Tian, Yuandong; Ré, Christopher; Barrett, Clark; Wang, Zhangyang; Chen, Beidi
NeurIPS 2023
|
2023
December
|
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer.
Tian, Yuandong; Wang, Yiping; Chen, Beidi; Du, Simon
NeurIPS 2023
|
2023
December
|
Laughing Hyena Distillery: Extracting Compact Recurrences from Convolutions.
Massaroli, Stefano; Poli, Michael; Fu, Dan; Kumbong, Hermann; Parnichkun, Rom; Romero, David; Timalsina, Aman; McIntyre, Quinn; Chen, Beidi; Rudra, Atri
NeurIPS 2023
|
2023
July
|
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time.
Liu, Zichang; Wang, Jue; Dao, Tri; Zhou, Tianyi; Yuan, Binhang; Song, Zhao; Shrivastava, Anshumali; Zhang, Ce; Tian, Yuandong; Re, Christopher; Chen, Beidi
ICML 2023 (Oral)
|
2023
July
|
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Sheng, Ying; Zheng, Lianmin; Yuan, Binhang; Li, Zhuohan; Ryabinin, Max; Chen, Beidi; Liang, Percy; Re, Christopher; Stoica, Ion; Zhang, Ce
ICML 2023 (Oral)
|
2023
July
|
CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks.
Wang, Jue; Lu, Yucheng; Yuan, Binhang; Chen, Beidi; Liang, Percy; De Sa, Christopher; Re, Christopher; Zhang, Ce
ICML 2023
|
2022
December
|
Decentralized Training of Foundation Models in Heterogeneous Environments.
Yuan, Binhang; He, Yongjun; Davis, Jared Quincy; Zhang, Tianyi; Dao, Tri; Chen, Beidi; Liang, Percy; Re, Christopher; Zhang, Ce
NeurIPS 2022 (Oral)
|
2022
December
|
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees.
Wang, Jue; Yuan, Binhang; Rimanic, Luka; He, Yongjun; Dao, Tri; Chen, Beidi; Re, Christopher; Zhang, Ce
NeurIPS 2022
|
2022
July
|
Monarch: Expressive Structured Matrices for Efficient and Accurate Training.
Dao, Tri; Chen, Beidi; Sohoni, Nimit S; Desai, Arjun; Poli, Michael; Grogan, Jessica; Liu, Alexander; Rao, Aniruddh; Rudra, Atri; Ré, Christopher
ICML 2022 (Outstanding Paper Runner Up)
|
2022
May
|
Pixelated Butterfly: Simple and Efficient Sparse Training for Neural Network Models.
Chen, Beidi; Dao, Tri; Liang, Kaizhao; Yang, Jiaming; Song, Zhao; Rudra, Atri; Re, Christopher
ICLR 2022 (Spotlight)
|
2022
April
|
Halos: Hashing Large Output Space for Cheap Inference.
Liu, Zichang; Xu, Zhaozhuo; Ji, Alan; Zhang, Junyan; Li, Jonathan; Chen, Beidi; Shrivastava, Anshumali
MLSys 2022
|
2021
December
|
Scatterbrain: Unifying Sparse and Low-Rank Attention.
Chen, Beidi; Dao, Tri; Winsor, Eric; Song, Zhao; Rudra, Atri; Ré, Christopher
NeurIPS 2021
|
2021
December
|
Locality Sensitive Teaching.
Xu, Zhaozhuo; Chen, Beidi; Li, Chaojian; Liu, Weiyang; Song, Le; Lin, Yingyan; Shrivastava, Anshumali
NeurIPS 2021
|
2021
July
|
A Tale of Two Efficient and Informative Negative Sampling Distributions.
Daghaghi, Shabnam; Medini, Tharun; Meisburger, Nicholas; Chen, Beidi; Zhao, Mengnan; Shrivastava, Anshumali
ICML 2021 (Oral)
|
2021
May
|
MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training.
Chen, Beidi; Liu, Zichang; Peng, Binghui; Xu, Zhaozhuo; Li, Jonathan Lingjie; Dao, Tri; Song, Zhao; Shrivastava, Anshumali; Re, Christopher
ICLR 2021 (Oral)
|
2021
May
|
SOLAR: Sparse Orthogonal Learned and Random Embeddings.
Medini, Tharun; Chen, Beidi; Shrivastava, Anshumali
ICLR 2021
|
2021
March
|
Satellite Images and Deep Learning to Identify Discrepancy in Mailing Addresses with Applications to Census 2020 in Houston.
Xu, Zhaozhuo; Ji, Alan Baonan; Woods, Andrew; Chen, Beidi; Shrivastava, Anshumali
JSM 2021
|
2020
December
|
SLIDE: In Defense of Smart Algorithms over Hardware Acceleration for Large-scale Deep Learning Systems.
Chen, Beidi; Medini, Tharun; Farwell, James; Gobriel, Sameh; Tai, Charlie; Shrivastava, Anshumali
MLSys 2020
|
2020
July
|
Angular Visual Hardness.
Chen, Beidi; Liu, Weiyang; Yu, Zhiding; Kautz, Jan; Shrivastava, Anshumali; Garg, Animesh; Anandkumar, Animashree
ICML 2020
|
2019
December
|
Fast and Accurate Stochastic Gradient Estimation.
Chen, Beidi; Xu, Yingchen; Shrivastava, Anshumali
NeurIPS 2019
|
2018
July
|
Densified Winner Take All (WTA) Hashing for Sparse Datasets.
Chen, Beidi; Shrivastava, Anshumali
UAI 2018
|
2018
June
|
Unique Entity Estimation with Application to the Syrian Conflict.
Chen, Beidi; Shrivastava, Anshumali; Steorts, Rebecca C
The Annals of Applied Statistics (IISA 2018 Best Student Paper in Applied Statistics)
|
2014
November
|
Analyzing Log Analysis: An Empirical Study of User Log Mining.
Alspaugh, Sara; Chen, Beidi; Lin, Jessica; Ganapathi, Archana; Hearst, Marti; Katz, Randy
LISA 2014 (Best Student Paper)
|