Role overview
- Build AI-powered features end-to-end: design, prototype, evaluate, ship, and operate. Frontend integration through to production monitoring.
- Design retrieval-augmented generation (RAG) systems over our data: chunking strategies, embedding models, vector store choice, hybrid search, and grounding.
- Build evaluation harnesses that measure what actually matters faithfulness, hallucination rate, latency, cost, instruction-following — and wire them into CI so quality doesn’t regress silently.
- Design agent architectures using tool use / function calling, structured outputs, and multi-step workflows. Plan for failure modes, not just happy paths.
- Own prompt engineering at the system level: versioning, testing, A/B comparison, and the discipline to treat prompts like code.
- Think about safety and reliability: prompt injection, abuse, misuse, and what “behaves predictably under pressure” actually means for our users.
- Manage cost and latency: model selection, caching, batching, and knowing when a smaller model is the right answer.
- Bring the rest of the team along: show colleagues how to think about modern AI, run internal workshops, and help us build a shared understanding of what’s possible and what isn’t.
- Production AI experience. You’ve shipped at least one real feature powered by a large language model or foundation model, and operated it in production. Demos and side projects are great, but production is where the lessons live.
- Strong Python skills and solid software engineering fundamentals: APIs, testing, CI/CD, version control. AI engineering is still engineering.
- Hands-on experience with major LLM provider APIs: including prompting, tool use, function calling, and structured outputs. You understand the trade-offs between providers, models, and open-source alternatives.
- Practical experience with RAG: embeddings, vector stores, retrieval optimisation, and grounding.
- Evaluation discipline. You’ve built or maintained an eval harness and can talk through what you measured and why.
- A pragmatic, product-minded approach. You know when to fine-tune, when to prompt, when to retrieve, and when to use a deterministic rule instead of an LLM.
- Excellent written communication: most of our deep work happens in writing, and explaining AI trade-offs clearly is half the job.
Preferred qualifications
- Experience with agent frameworks or orchestration patterns.
- Fine-tuning experience (SFT, LoRA, DPO, RLHF) and a clear view on when it’s worth it.
- Experience with cloud ML platforms (AWS, Google Cloud, Azure).
- Observability and LLM-as-judge evaluation pipelines.
- Familiarity with AI safety thinking, red-teaming, failure-mode analysis, responsible AI principles.
- A blog post, open-source contribution, or public artifact that shows how you think about this work.
Benefits
- Salary: £80,000 – £110,000 depending on experience.
- Pension: 5% employer contribution.
- Time off: 28 days holiday plus bank holidays.
- Flexible working: Hybrid by default; fully remote within the UK is open for the right person.
- Learning budget: £2,000/year: books, courses, conferences, API credits to experiment with. AI moves fast and we’ll fund you keeping up.
- API & compute budget. We give you real budget for model API usage from day one, so you can prototype freely.
- Equipment: A setup of your choosing, refreshed every three years.
- The chance to shape something from zero. You won’t inherit an AI strategy, you’ll help write it.
Tags & focus areas
Used for matching and alerts on DevFound Fulltime Remote Ai Ai Engineer