Research
Four interconnected directions.
InfiniAI Lab develops efficient and scalable AI models and systems by co-designing
algorithms, model architectures, and hardware-aware infrastructure.
01
Sparse Model Architecture
We rethink dense computation to unlock long-context and long-generation efficiency,
expand model capacity at constant compute through Mixture-of-Experts and conditional
computation, and study the scaling laws that govern how sparse models grow with data,
parameters, and hardware.
02
Scalable Agentic RL: Training & Inference
We build the systems stack for the next generation of reinforcement-learned, tool-using
agents — large-scale and asynchronous RL pipelines, agentic serving engines, and
kernel- and system-level optimizations for sparse attention and MoE inference at
frontier scale.
03
Real-time & Multimodal Generation
We bring generative AI into the interactive, multi-sensory loop where humans actually
work and create — developing streaming text, image, audio, and video models with
sub-second latency, unified multimodal architectures, and human–AI collaboration
interfaces that enhance creativity rather than replace it.
04
MLSys for the Post-AGI Era
We prepare infrastructure for a world where AI agents are first-class operators of the
AI stack itself — systems that let agents build, deploy, optimize, secure, and evolve
ML infrastructure with minimal human intervention, while remaining safe, verifiable,
and robust as models and tools co-evolve.