Role overview
Location:** Berlin OR Hessen (Germany)
What you'll work on
- Design, develop, test, and maintain AI/ML infrastructure within a scalable microservices architecture running on Kubernetes.
- Build and maintain high-quality, secure, and reliable DevOps pipelines and Helm charts
- Work across the backend stack, integrating event-driven systems (Kafka), gRPC services, and REST APIs
- Develop and optimize data pipelines using modern data engineering tools (e.g., Spark)
- Manage ML lifecycle processes using tools such as MLflow
- Contribute to architectural decisions to improve scalability, performance, and system reliability
- Support deployment and monitoring of ML models in complex production environments, including isolated (air-gapped) setups with varying hardware constraints (CPU/GPU).
- Ensure platform reliability and robustness in customer-deployed Kubernetes environments
- Maintain high security and compliance standards aligned with industry best practices (e.g., ISO 27001)
What we're looking for
- Degree in Computer Science, Engineering, or equivalent practical experience
- 5+ years of experience in AI/ML platform engineering or related roles
- Strong experience with Kubernetes, distributed systems, and data engineering technologies
- Hands-on experience with ML platforms and frameworks (e.g., MLflow, PyTorch, SparkML)
- Familiarity with modern data stack technologies (e.g., Spark, Delta Lake, TensorFlow, ONNX)
- Experience building clean, maintainable, and testable systems following modern software engineering principles
- Knowledge of cloud-native development and DevOps practices (including Helm)
- Experience working in security-sensitive or highly regulated environments is a plus.
- Strong problem-solving and debugging skills
- Excellent communication skills in English and ability to collaborate across teams
Tags & focus areas
Used for matching and alerts on DevFound Fulltime Machine Learning Mlops Data Engineer Ai