Responsibilities

Research and develop agent frameworks that continuously learn and improve from execution traces, user feedback, and environmental signals.
Build large-scale log analytics pipelines to extract quality signals, usage patterns, and actionable insights from model and agent invocation logs, driving data-informed system and model improvements.
Explore and apply frontier techniques in LLM post-training, reasoning, and planning to enhance agent capabilities.
Collaborate across algorithm research, platform engineering, and product teams to turn research ideas into production-grade systems at scale.

Basic qualifications

Individuals who are completing or have recently completed a Ph.D. in Computer Science, Artificial Intelligence, Machine Learning, or a closely related discipline.
Strong theoretical and practical foundation in machine learning, deep learning, reinforcement learning, or optimization.
Research experience in at least one of the following areas: LLM-based agents, planning and reasoning, multi-agent systems, continual/lifelong learning, or LLM post-training (e.g., RLHF, DPO, GRPO, self-play).
Strong programming skills in Python and proficiency with ML frameworks (e.g., PyTorch, TensorFlow, JAX).
Publication record at top-tier venues (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, AAAI, AAMAS, COLM).
Strong problem-solving skills and ability to thrive in a fast-paced, collaborative environment.

Publications in areas directly related to agent learning and adaptation, such as tool use, self-improvement, skill discovery, trajectory optimization, reward modeling, or agent evaluation.
Research experience in LLM reasoning and planning, including chain-of-thought, tree/graph search, Monte Carlo methods, or inference-time compute scaling.
Experience training or fine-tuning large language models, including supervised fine-tuning, preference optimization, or curriculum learning.
Hands-on experience building or evaluating LLM-based agent systems (e.g., ReAct, function calling, code generation agents, or multi-agent orchestration).
Familiarity with meta-learning, few-shot generalization, or transfer learning in the context of LLM-based systems.
Experience with feedback-driven optimization loops, such as online learning, bandit methods, or evolutionary strategies applied to agent improvement.
Strong interest in bridging frontier AI research with production-grade engineering — turning papers into systems that work at scale.
Internship experience at technology companies or research organizations.
Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues;
Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; and
Exercising sound judgment.

Used for matching and alerts on DevFound

Internship Machine Learning Generative Ai Ai