Mercor
AI

Data Scientist | Fully Remote

Mercor · · $208k - $249k

Actively hiring Posted 6 months ago

About The Job
Mercor

connects elite creative and technical talent with leading AI research labs. Headquartered in San Francisco, our investors include
Benchmark
,
General Catalyst
,
Peter Thiel
,
Adam D'Angelo
,
Larry Summers
, and
Jack Dorsey
.

Position:
AI Task Evaluation & Statistical Analysis Specialist

Type:
Contract
Compensation:
$100–$120/hour
Location:
Remote
Role Responsibilities

  • Conduct comprehensive statistical failure analysis to identify patterns in AI agent failures across task components such as prompts, rubrics, and templates.
  • Perform root cause analysis to determine if failures are due to task design, rubric clarity, file complexity, or agent limitations.
  • Analyze performance variations across finance sub-domains, file types, and task categories to enhance understanding of AI model performance.
  • Create dashboards and reports to highlight failure clusters, edge cases, and improvement opportunities.
  • Recommend improvements to task design, rubric structure, and evaluation criteria based on statistical findings.
  • Present insights to data labeling experts and technical teams to foster collaboration and drive improvements.

Qualifications
Must-Have

  • Statistical Expertise: Strong foundation in statistical analysis, hypothesis testing, and pattern recognition.
  • Programming: Proficiency in Python (pandas, scipy, matplotlib/seaborn) or R for data analysis.
  • Data Analysis: Experience with exploratory data analysis and creating actionable insights from complex datasets.
  • AI/ML Familiarity: Understanding of LLM evaluation methods and quality metrics.
  • Tools: Comfortable working with Excel, data visualization tools (Tableau/Looker), and SQL.

Preferred

  • Experience with AI/ML model evaluation or quality assurance.
  • Background in finance or willingness to learn finance domain concepts.
  • Experience with multi-dimensional failure analysis.
  • Familiarity with benchmark datasets and evaluation frameworks.
  • 2-4 years of relevant experience.

Application Process (Takes 20–30 mins to complete)

  • Upload resume
  • AI interview based on your resume
  • Submit form

Resources & Support

PS: Our team reviews applications daily. Please complete your AI interview and application steps to be considered for this opportunity.
,

Tags & focus areas

Used for matching and alerts on DevFound
Parttime Remote Ai Data Science Generative Ai