Role overview
- Designing and implementing evaluation frameworks to measure LLM and agent performance across reasoning, accuracy, multi-turn dialogue, and tool usage
- Creating and open-sourcing benchmarks for evaluating LLM output on investment-research-specific tasks such as synthesis quality and citation grounding
- Building prompt refinement systems that learn from production signals and human feedback to improve reliability and performance
- Developing and maintaining agentic tooling including research assistants, deep research flows, and voice agents
- Integrating external APIs, search, speech-to-text, and text-to-speech technologies into production systems
- Prototyping lightweight voice agent frameworks with strong evaluation around latency, error recovery, and conversational flow
- Collaborating closely with research and product teams to productionize new prompting, retrieval, and multi-agent orchestration techniques
- Contributing meaningfully to product direction, prioritization, and long-term technical strategy
Benefits
- Gym membership
- In-office cook
- Summers working remotely by the beach
About the company
- Is based in NYC (in-person, 5 days per week)
- Has 5+ years of professional experience, with recent, hands-on work with LLMs
- Has strong opinions and enjoys contributing to product and architectural decisions
- Communicates clearly and is comfortable in a client-facing environment
- Can explain complex AI concepts to non-technical stakeholders and turn ideas into testable experiments
- Has built LLM systems end to end in a product-focused organization, from data and logging to evaluation and prompt optimization
- Has a strong bias to action and experience delivering complex projects with senior stakeholders
- Is excited to help grow a team and shape engineering culture
- Deep, recent experience working with LLMs and agentic systems is required
- Strong software engineering mindset rather than a purely research-focused background
- Either a software engineer who has transitioned into LLM systems, or an ML engineer who has spent the last few years heavily focused on LLMs
- Experience forming clear views on improving LLM output, reliability, and evaluation
- Leadership potential, with the opportunity to grow into a Head of AI role over time
Tags & focus areas
Used for matching and alerts on DevFound Fulltime Remote Ai Ai Engineer