Responsibilities
- Design and train highly reliable, scalable AI/ML models that empower engineers across all SpaceX departments
- Design agentic AI systems and multi-agent workflows to perform engineering tasks
- Build and optimize large-scale machine learning training pipelines to create next-generation AI applications that transform day-to-day engineering operations
- Develop and fine-tune foundation models (LLMs, vision models, multimodal systems) for high-impact SpaceX use cases
- Create production-grade AI tools for data analysis, anomaly detection, predictive modeling, and automated decision-making
- Collaborate with peers on AI architecture, model design, training strategies, and code reviews
- Rapidly build and iterate on AI prototypes, rigorously quantifying model performance, accuracy, and technical constraints
- Own the complete AI model lifecycle — from data preparation and training infrastructure to deployment, monitoring, and continuous improvement
- Deep-dive into complex engineering problems to identify and implement efficient, custom-trained AI solutions
- Establish rigorous AI standards for model validation, safety, reliability, bias mitigation, and data security
- Ensure all AI systems undergo thorough testing and validation to deliver accurate, trustworthy, and production-ready outputs
Basic qualifications
- Bachelor's degree in computer science, data science, engineering, math, or physics; OR 4+ years of professional experience building and training AI/ML systems in lieu of a degree
- 1+ years of experience in AI software engineering with a focus on model training, fine-tuning, and machine learning systems
- 1+ years of programming experience in Python
- Expert understanding of LLM transformer architectures and training procedure including pre-training, supervised fine tuning, and reinforcement learning.
- Demonstrated experience training and fine-tuning large language models (LLMs) and other foundation models at scale
- Proven track record training and optimizing machine learning models for computer vision (object detection, segmentation, 3D reconstruction, etc.)
- Deep expertise with modern ML frameworks: PyTorch, TensorFlow, JAX, or equivalent
- Experience designing and running large-scale ML training pipelines, including distributed training on GPU clusters, hyperparameter optimization, and experiment tracking
- Strong understanding of MLOps best practices: model versioning, experiment management (MLflow, Weights & Biases, etc.), CI/CD for ML, and automated retraining
- Strong foundation in statistics, machine learning theory, deep learning architectures, optimization algorithms, and model evaluation
- Proficiency developing on Linux systems
- Solid understanding of version control (Git), testing, continuous integration, deployment, and monitoring for ML systems
- Experience building complexagentic AI systems and multi-agent workflows
- Experience with data infrastructure for training: relational databases (PostgreSQL), non-relational databases, data lakes, and feature stores (vector databases are a plus)
- Experience deploying containerized applications using Docker and Kubernetes
- Willing to work extended hours and weekends as needed
Benefits
- To conform to U.S. Government export regulations, applicant must be a (i) U.S. citizen or national, (ii) U.S. lawful, permanent resident (aka green card holder), (iii) Refugee under 8 U.S.C. § 1157, or (iv) Asylee under 8 U.S.C. § 1158, or be eligible to obtain the required authorizations from the U.S. Department of State. Learn more about the ITAR here.
Tags & focus areas
Used for matching and alerts on DevFound Ai