Responsibilities
- Lead and drive the deployment, lifecycle management, and monitoring of ML/DL models in all stages leading to production.
- Design and implement systems for Dataset and Label Management, including versioning and integrating customer feedback into labeling workflows.
- Establish and maintain a robust Model Repository/Registry that supports versioning, local inference, and model lineage.
- Lead the implementation of advanced Experiment Tracking and Monitoring solutions for both Data Science and Generative AI, focusing on evaluation, data drift detection, and model reproducibility.
- Own model serving and inference systems—including autoscaling, A/B testing, canary rollouts, and latency/cost optimization for production models.
- Enable specialized infrastructure for Generative AI capabilities, including tagging tools, prompt management, and LLM testing services.
- Drive operational excellence by improving tool deployment usability and implementing granular cost visibility across projects and environments.
- Developing reusable components such as standardized data loaders, CI/CD pipelines, and automated workflows for tasks like model retraining.
- Collaborate directly with Data Scientists and the rest of the Data Platform Engineering team to productionize ML/DL models developed for cloud environments.
Basic qualifications
- B.Sc. or M.Sc. in Computer Science or Software Engineering or related field
- Experienced with ML/DL workflows and their best practices
- Experienced with CI/CD workflows and their best practices
- Worked with public cloud (AWS/Azure/GCP)
- Experienced with Python and Java
- Experience with various data stores like Postgres, MongoDB, Redis
- Experience with DS tools such as MLFlow, Langfuse, SageMaker, etc.
- Experience with Spark
- Experience with PyTorch/TensorFlow
Benefits
- Flexible work arrangements.
- 15 working days per year as Non-Operational Allowance for personal recreation, fully compensated.
- Health insurance.
- Public holidays.
- Truly competitive salary.
- Supportive HR and management team.
Tags & focus areas
Used for matching and alerts on DevFound Remote Ai Machine Learning Mlops