Role overview
We’re looking for an ML engineer to own large-scale training of our Lumen Enterprise models – our open‑source–based software engineering LLMs.
You’ll work on supervised fine-tuning (SFT), and reinforcement learning (RL) and continued pretraining on top of open-source base models to push state-of-the-art performance on real software engineering tasks: reading and modifying large codebases, using tools, and reasoning about complex systems.
If you enjoy working close to the metal with PyTorch and distributed training, and you like making big models actually work in practice, this role is for you.
What you'll work on
- Take open-source base models (code + general LLMs) and turn them into high-performance Lumen Enterprise SWE agents via SFT and RL.
- Design and run large-scale training experiments on multi-node GPU clusters, including long-context training and MoE-style architectures.
- Build and iterate on large-scale RL loops where models write code, run tests or tools, and get rewarded (or penalized) accordingly.
- Work hands-on across the stack: custom PyTorch dataloaders, distributed training primitives, RL objectives, and evaluation on real-world repos and tasks.
You’ll collaborate closely with infra, product, and research to decide what to train next, how to train it, and how to measure whether it’s actually better for engineers.
What we're looking for
You don’t need all of these, but the more you have, the more you’ll hit the ground running:
- Continued pretraining and long-context experience:
- Have run continued pretraining on domain-specific or long-context corpora.
- Familiarity with techniques like RoPE scaling, YaRN-style extrapolation, context parallelism, or similar.
- Code-focused RL and evaluation:
- Experience building RL loops where rewards come from code execution (tests, linters, static analysis, fuzzing, runtime traces).
- Familiarity with evaluation benchmarks for code models (e.g. HumanEval, MBPP, SWE-bench, or internal equivalents).
- Experience with modern LLM training stacks:
- Experience with large MoE models and expert/tensor parallelism is a plus.
- Serving and online training:
- Experience in tuning inference tasks for opensource frameworks, e.g. VLLM, SGLang, etc.
- Safety, robustness, and reward shaping:
- Experience with LLM-as-a-judge, reward hacking detection, or robustness evaluation.
- Open-source contributions or research:
- Contributions to open-source LLM tooling, RL libraries, or relevant research papers in LLM training / RLHF / code models.