Position Overview
We are seeking an ML Platform Engineer to design and scale machine learning infrastructure that powers advanced data-driven systems. This role focuses on building automated pipelines, optimizing distributed training environments, and ensuring that models move seamlessly from research to reliable, production-grade deployment.
Why This Role Matters
As organizations scale their use of AI, the bridge between experimentation and production becomes critical. The ML Platform Engineer ensures that machine learning models are trained efficiently, deployed reliably, and monitored continuously — turning innovation into measurable impact. This role directly contributes to faster iteration, lower operational costs, and higher-performing models in real-world environments.
About the Role
You will collaborate with data scientists, machine learning engineers, and infrastructure teams to build the backbone of scalable AI systems. This includes designing ML pipelines, automating workflows, managing large-scale data processing, and optimizing compute resources. The role combines software engineering expertise with deep understanding of machine learning systems and MLOps practices.
Key Responsibilities
- Design and maintain end-to-end machine learning pipelines for data ingestion, training, evaluation, and deployment.
- Develop reusable frameworks and tooling to accelerate experimentation and model delivery.
- Collaborate with ML teams to optimize infrastructure for performance, cost, and scalability.
- Implement automation for model retraining, monitoring, and lifecycle management.
- Manage distributed computing environments using frameworks like Spark, Flink, and Kafka.
- Ensure reproducibility, governance, and traceability across all ML workflows.
- Contribute to cloud infrastructure optimization using AWS, GCP, or Azure.
- Establish best practices for CI/CD, MLOps, and model versioning.
Minimum Qualifications
- 3+ years of experience designing or supporting ML pipelines and infrastructure.
- Strong proficiency with Python and one or more compiled languages (Java, Scala, or C++).
- Hands-on experience with data processing frameworks (Kafka, Spark, Flink).
- Solid understanding of cloud-based architectures and distributed systems.
- Experience with machine learning frameworks such as TensorFlow, PyTorch, or Scikit-learn.
- Familiarity with MLOps tools (MLflow, Kubeflow, SageMaker, Vertex AI).
- Strong debugging and problem-solving skills with a focus on automation.
- Effective communicator with ability to bridge data science and engineering teams.
Preferred Qualifications
- 5+ years of experience in production-scale ML systems or platform engineering.
- Advanced degree (MS or PhD) in Computer Science, Machine Learning, or a related field.
- Prior experience in ad tech, personalization, or recommendation systems.
- Experience building cloud-native ML platforms and infrastructure-as-code pipelines.
- Exposure to LLMs, reinforcement learning, or real-time inference systems.
- Results-oriented and comfortable in fast-paced, collaborative environments.