McKesson
AI

Sr Associate Data Scientist (Databricks Platform)

McKesson · Columbus, OH, US · $84k - $140k

Actively hiring Posted 4 months ago

Responsibilities

  • Build, train, evaluate, and optimize machine‑learning models using Spark MLlib, Python, and cloud‑based toolchains
  • Perform exploratory data analysis (EDA), statistical profiling, and feature engineering on large-scale datasets hosted in Databricks
  • Implement and manage MLflow experiment tracking, model registry, versioning, and reproducibility workflows
  • Contribute to model monitoring, performance tuning, drift detection, and continuous improvement.
  • Develop notebooks, jobs, and workflows within Databricks for data preparation, model training, and batch/streaming inference
  • Utilize Unity Catalog for secure, governed data access, lineage, and metadata management
  • Work with Delta Lake (bronze/silver/gold layers) for scalable feature pipelines supporting both training and production
  • Collaborate with Engineering to migrate workloads to Databricks and support transformations, optimizations, and cost‑efficient compute usage.
  • Build reusable, production‑grade feature pipelines in PySpark and SQL
  • Implement data validation, quality checks, and transformation logic consistent with enterprise guidelines
  • Participate in design sessions for ingestion, medallion architecture workflows, and schema evolution
  • Partner with Data Engineering, Analytics, Product, and SMEs to translate business problems into data‑driven solutions
  • Document model assumptions, data transformations, evaluation metrics, and deployment patterns

Basic qualifications

  • Bachelor’s degree in Data Science, Computer Science, Analytics, Math, Statistics, Engineering, or related field, or related experience
  • Typically requires 2+ years of experience in applied ML, data science, or advanced analytics
  • Hands-on experience with Python, PySpark, SQL, and Git-based workflows
  • Practical exposure to cloud-based ML environments (preferably Databricks)
  • Understanding of ML techniques such as regression, classification, clustering, time-series forecasting, and embeddings
  • Ability to work with large, complex datasets

Preferred qualifications

  • Experience with Databricks MLflow, model serving, and workflow orchestration
  • Familiarity with Delta Lake storage formats, feature engineering at scale, and medallion architecture patterns
  • Experience deploying models into production environments with monitoring and observability

Tags & focus areas

Used for matching and alerts on DevFound
Remote Data Science Ai