Responsibilities

Build, train, evaluate, and optimize machine‑learning models using Spark MLlib, Python, and cloud‑based toolchains
Perform exploratory data analysis (EDA), statistical profiling, and feature engineering on large-scale datasets hosted in Databricks
Implement and manage MLflow experiment tracking, model registry, versioning, and reproducibility workflows
Contribute to model monitoring, performance tuning, drift detection, and continuous improvement.
Develop notebooks, jobs, and workflows within Databricks for data preparation, model training, and batch/streaming inference
Utilize Unity Catalog for secure, governed data access, lineage, and metadata management
Work with Delta Lake (bronze/silver/gold layers) for scalable feature pipelines supporting both training and production
Collaborate with Engineering to migrate workloads to Databricks and support transformations, optimizations, and cost‑efficient compute usage.
Build reusable, production‑grade feature pipelines in PySpark and SQL
Implement data validation, quality checks, and transformation logic consistent with enterprise guidelines
Participate in design sessions for ingestion, medallion architecture workflows, and schema evolution
Partner with Data Engineering, Analytics, Product, and SMEs to translate business problems into data‑driven solutions
Document model assumptions, data transformations, evaluation metrics, and deployment patterns

Basic qualifications

Bachelor’s degree in Data Science, Computer Science, Analytics, Math, Statistics, Engineering, or related field, or related experience
Typically requires 2+ years of experience in applied ML, data science, or advanced analytics
Hands-on experience with Python, PySpark, SQL, and Git-based workflows
Practical exposure to cloud-based ML environments (preferably Databricks)
Understanding of ML techniques such as regression, classification, clustering, time-series forecasting, and embeddings
Ability to work with large, complex datasets

Experience with Databricks MLflow, model serving, and workflow orchestration
Familiarity with Delta Lake storage formats, feature engineering at scale, and medallion architecture patterns
Experience deploying models into production environments with monitoring and observability

Used for matching and alerts on DevFound

Remote Data Science Ai