Responsibilities
- Write efficient Triton kernels and tune them for our specific models and hardware
- Develop prefix-aware routing algorithms to improve serving cache hit rate
- Train and distill LLMs to improve latency while preserving accuracy and engagements
- Build an efficient and scalable distributed RLHF stack powering the model innovations
- Develop systems for efficient multimodal (image gen/video gen) model training & inference
- "All Industry Levels": at least PhD (or equivalent) research experience
- Write clear and clean production system code
- Strong understanding of modern machine learning techniques (reinforcement learning, transformers, etc)
- Track record of exceptional research or creative ML systems projects
- Comfortable writing model development code (PyTorch) for either training or inference
Preferred qualifications
- Experience training large models in a distributed setting utilizing PyTorch distributed, DeepSpeed, Megatron.
- Experience working with GPUs & collectives (training, serving, debugging) and writing kernels (Triton, CUDA, CUTLASS).
- Experience with LLM inference systems and literature such as vLLM and FlashAttention.
- Familiarity with ML deployment and orchestration (Kubernetes, Docker, cloud)
- Publications in relevant academic journals or conferences in the field of machine learning and systems
About the company
Joining us as a Research Engineer on the ML Systems team, you’ll be working on cutting-edge ML training and inference systems, optimizing the performance and efficiency of our GPU clusters, and developing new technologies that fine-tune leading consumer AI models with a data flywheel, and serve 20K+ QPS in production with LLMs. Your work will directly contribute to our groundbreaking advancements in AI, helping shape an era where technology is not just a tool, but a companion in our daily lives. At Character.AI, your talent, creativity, and expertise will not just be valued—they will be the catalyst for change in an AI-driven future.