ML Systems Engineer Model Training and Infrastructure SWEfocused LLMs

Cosine UK · London, ENG, GB · $101k - $139k

Actively hiring Posted 11 days ago 6 min read

Role overview

We’re looking for an ML Systems Engineer to collaborate in training our Lumen models – our open‑source–based software engineering LLMs.

This is a unique, and truly interdisciplinary role that involves developing and deploying our reinforcement learning (RL) training environments, working on synthetic data pipelines at massive scale and running fine-tuning jobs to train the next generation of SWE models that will be used in both our self-serve and enterprise products.

We want to make sure that the models we train are the best SWEs in the world - this doesn’t just mean training them to get the right answer, it means training them so that they write readable, maintainable code, that fits with the architectural patterns already present in the codebase. We believe we’re now in the anti-slop era of coding agents, where data, RL environments and opinionated reward functions will shape the future standards of SWE models. If this sounds exciting, then this could be the role for you.

What you'll work on

Develop and manage synthetic data generation pipelines to curate datasets that will underpin future RL fine-tunes.
Design, build and deploy containerized services using Docker and platforms like Kubernetes to enable our RL infrastructure.
Build and iterate on large-scale RL loops where models write code, run tests or tools, and get rewarded (or penalized) accordingly.
Work hands-on across the stack: custom PyTorch dataloaders, RL objectives, and evaluation on real-world repos and tasks.

You’ll collaborate closely with infra, product, and research to decide what to train next, how to train it, and how to measure whether it’s actually better for engineers.

What we're looking for

You don’t need all of these, but the more you have, the more you’ll hit the ground running:

Experience with synthetic data generation pipelines
Experience with data tooling like SQL, Apache Iceberg and duckDB
Experience training LLMs in distributed environments
Safety, robustness, and reward shaping:
- Experience with LLM-as-a-judge, reward hacking detection, or robustness evaluation.
Open-source contributions or research:
- Contributions to open-source LLM tooling, RL libraries, etc.

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Ai Ai Engineer Robotics Generative Ai

ML Systems Engineer Model Training and Infrastructure SWEfocused LLMs

Role overview

What you'll work on

What we're looking for

Tags & focus areas

Ready to Join the Team?