Responsibilities
- Automated Judge Development: Train, fine-tune, and validate automated judge models that can reliably score AI system outputs for safety and policy compliance. Develop calibration and agreement metrics to ensure judges meet human-parity benchmarks.
- Validation Techniques: Design and implement validation frameworks to assess the accuracy, reliability, and cross-linguistic consistency of automated evaluation systems. Develop methods to detect drift, bias, and failure modes in automated judges across markets.
- Synthetic Data Generation: Develop and maintain synthetic data generation pipelines to augment evaluation coverage, stress-test safety boundaries, and support evaluation in low-resource languages. Ensure synthetic data is diverse, representative, and validated against human-generated benchmarks.
- Scalable Analysis & Reporting Automation: Create automated pipelines for analysis and reporting that reduce manual effort, increase reproducibility, and enable rapid cross-market safety assessments. Build tooling that integrates with existing dashboards and reporting workflows.
Basic qualifications
- 3+ years of experience in an ML engineering or applied ML research role, with hands-on experience building and deploying ML models and pipelines.
- Strong proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow, Hugging Face Transformers).
- Experience training, fine-tuning, and evaluating language models and/or classifiers, including prompt engineering and model calibration.
- Experience building automated data processing, evaluation, or monitoring pipelines.
- Comfortable with experiment design and statistical validation of model performance across segmented samples.
- Able to work independently as well as collaboratively with minimal direction.
- Organized, highly attentive to detail, and manages time well.
- Advanced degree (MS/PhD) in Computer Science, Machine Learning, Natural Language Processing, or a related field.
Preferred qualifications
- Experience working in industry.
- Experience with synthetic data generation techniques, including data augmentation, paraphrasing, and controlled generation methods.
- Experience with multilingual NLP, cross-lingual transfer learning, or low-resource language modeling.
- Familiarity with evaluation-as-a-service architectures or automated red teaming frameworks.
- Experience with large-scale distributed computing (e.g., Spark, Ray, or cloud-based ML platforms).
- Prior experience in AI safety, responsible AI, content moderation, or trust and safety domains.
- Experience with CI/CD integration for ML model validation and deployment.
- Opportunity to work on cutting-edge projects
- Work with a highly motivated and dedicated team
- Competitive salary
- Flexible schedule
- Benefits package - medical insurance, vision, dental, etc.
- Corporate social events
- Professional development opportunities
- Well-equipped office
Tags & focus areas
Used for matching and alerts on DevFound Fulltime Remote Ai Machine Learning