Responsibilities

Design and build durable, idempotent ingestion pipelines for creative and performance data at scale (queues, retries, backpressure, dedup, schema evolution)
Generate and manage embeddings for multi-modal creative assets; select and operate the right vector store for the workload
Build and maintain retrieval pipelines that serve AI agent tools with accurate, low-latency responses
Ship agent-style systems with tool use, state management, and multi-step reasoning workflows
Develop and maintain the React frontend for the creative intelligence library and query interface
Own the full lifecycle of your systems: design, build, deploy, monitor, and iterate
Contribute to stack decisions with clear reasoning grounded in production experience
Collaborate closely with product and enterprise partners to translate requirements into reliable, scalable systems

Basic qualifications

Strong TypeScript — you use types as a design tool, not a formality
Production experience with serverless or edge runtimes (Cloudflare Workers, Vercel, Lambda, Deno Deploy, or equivalent)
Demonstrated experience building durable, idempotent ingestion pipelines with queuing, retry logic, backpressure handling, deduplication, and schema evolution
Practical, production-level understanding of embeddings, chunking strategies, and retrieval quality tuning
At least one agent-style system shipped to production: tool use, stateful multi-step workflows — framework matters less than the experience
React fluency with modern patterns and component architecture
Comfort operating across two cloud environments; able to reason clearly about when to use edge compute vs. managed data/AI services, and how to bridge them
Must have prior remote work experience, be fluent with remote collaboration tools and platforms (such as Slack, Zoom, Google Workspace, Linear, or similar), and have ideally worked with US or UK-based companies. Applications without this experience will not be considered.

Experience building or operating RAG systems in production
Familiarity with current embedding models and the tradeoffs across dimension, quality, and cost
Background in ETL design, observability for data pipelines, or evaluation frameworks for retrieval quality
Adtech, performance marketing, or marketing analytics background — understanding what channels, attribution, and creative testing look like in a live production context
Opinions on vector databases (Cloudflare Vectorize, Vertex AI Vector Search, Turbopuffer, or similar) backed by hands-on experience
TypeScript (primary language across the stack)
Cloudflare Workers, Queues, and Agents SDK (or equivalent edge runtime)
GCP — Vertex AI for embeddings and related data/AI services
Vector database (to be selected: Cloudflare Vectorize, Vertex AI Vector Search, Turbopuffer, or similar)
React with Remix or TanStack Start (TBD)
Google Workspace, Slack, Zoom, and standard remote collaboration tooling

Used for matching and alerts on DevFound

Remote Ai Ai Engineer Data Engineer