Role overview

Build AI-powered features end-to-end: design, prototype, evaluate, ship, and operate. Frontend integration through to production monitoring.
Design retrieval-augmented generation (RAG) systems over our data: chunking strategies, embedding models, vector store choice, hybrid search, and grounding.
Build evaluation harnesses that measure what actually matters faithfulness, hallucination rate, latency, cost, instruction-following — and wire them into CI so quality doesn’t regress silently.
Design agent architectures using tool use / function calling, structured outputs, and multi-step workflows. Plan for failure modes, not just happy paths.
Own prompt engineering at the system level: versioning, testing, A/B comparison, and the discipline to treat prompts like code.
Think about safety and reliability: prompt injection, abuse, misuse, and what “behaves predictably under pressure” actually means for our users.
Manage cost and latency: model selection, caching, batching, and knowing when a smaller model is the right answer.
Bring the rest of the team along: show colleagues how to think about modern AI, run internal workshops, and help us build a shared understanding of what’s possible and what isn’t.
Production AI experience. You’ve shipped at least one real feature powered by a large language model or foundation model, and operated it in production. Demos and side projects are great, but production is where the lessons live.
Strong Python skills and solid software engineering fundamentals: APIs, testing, CI/CD, version control. AI engineering is still engineering.
Hands-on experience with major LLM provider APIs: including prompting, tool use, function calling, and structured outputs. You understand the trade-offs between providers, models, and open-source alternatives.
Practical experience with RAG: embeddings, vector stores, retrieval optimisation, and grounding.
Evaluation discipline. You’ve built or maintained an eval harness and can talk through what you measured and why.
A pragmatic, product-minded approach. You know when to fine-tune, when to prompt, when to retrieve, and when to use a deterministic rule instead of an LLM.
Excellent written communication: most of our deep work happens in writing, and explaining AI trade-offs clearly is half the job.

Preferred qualifications

Experience with agent frameworks or orchestration patterns.
Fine-tuning experience (SFT, LoRA, DPO, RLHF) and a clear view on when it’s worth it.
Experience with cloud ML platforms (AWS, Google Cloud, Azure).
Observability and LLM-as-judge evaluation pipelines.
Familiarity with AI safety thinking, red-teaming, failure-mode analysis, responsible AI principles.
A blog post, open-source contribution, or public artifact that shows how you think about this work.

Benefits

Salary: £80,000 – £110,000 depending on experience.
Pension: 5% employer contribution.
Time off: 28 days holiday plus bank holidays.
Flexible working: Hybrid by default; fully remote within the UK is open for the right person.
Learning budget: £2,000/year: books, courses, conferences, API credits to experiment with. AI moves fast and we’ll fund you keeping up.
API & compute budget. We give you real budget for model API usage from day one, so you can prototype freely.
Equipment: A setup of your choosing, refreshed every three years.
The chance to shape something from zero. You won’t inherit an AI strategy, you’ll help write it.

Tags & focus areas

Used for matching and alerts on DevFound

Fulltime Remote Ai Ai Engineer

London - Applied AI Engineer

Role overview

Preferred qualifications

Benefits

Tags & focus areas

Ready to Join the Team?