We're looking for a passionate DevOps Engineer to join our innovative Data Science team. 🚀
You'll be the crucial link between our machine learning models and our production environment, responsible for building the infrastructure that allows our data scientists to create and deploy cutting-edge solutions at scale.
In this role, you won't just be deploying models; you'll be building and automating the entire ML lifecycle. If you love solving complex problems and want to productionize state-of-the-art AI, this is the perfect opportunity for you.
Key Responsibilities
- Design and Build ML Infrastructure: Create, manage, and scale the infrastructure required for training and deploying our machine learning models.
- Automate ML Pipelines: Develop and maintain robust CI/CD/CT (Continuous Integration/Continuous Delivery/Continuous Training) pipelines for the full ML lifecycle.
- Deploy & Serve Models: Implement strategies for deploying models as scalable, reliable services using technologies like containerization (Docker, Kubernetes, ECS Fargate) and serverless functions.
- Monitor Model Performance: Establish and manage comprehensive monitoring solutions to track model accuracy, data drift, and system health to ensure our models perform as expected in production.
- Collaborate Cross-Functionally: Work closely with data scientists to understand model requirements and with software engineers to integrate ML models into our core products.
- Champion Best Practices: Advocate for and implement MLOps best practices in versioning (data, code, models), testing, and security across the team.
Requirements
Required (Must-Have)
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
- Proven experience in a DevOps, Software Engineering, or MLOps role.
- Strong programming skills, particularly in Python.
- Hands-on experience with at least one major cloud platform (AWS, GCP, or Azure) and its ML services (e.g., SageMaker, Vertex AI, Azure ML).
- Solid experience with containerization (Docker) and orchestration (Kubernetes).
- Experience building and managing CI/CD pipelines using tools like GitLab CI, GitHub Actions, or Jenkins.
- A solid understanding of the end-to-end machine learning lifecycle.
Preferred
- Experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Familiarity with MLOps frameworks like MLflow, Kubeflow, or Vertex AI Pipelines.
- Experience with data processing frameworks such as Apache Spark or data workflow tools like Airflow.
- Knowledge of model and LLM observability solutions based on Prometheus, OpenTelemetry, Grafana Cloud, Evidently AI.
- Familiarity with LLM Router administration such as OpenRouter, LiteLLM, vLLM.
Benefits
- Competitive base salary with additional performance incentives.
- Coverage under the company’s collective health insurance plan.
- Learning and development opportunities (e.g. onboarding, on-the-job training).
- Annual training budget.
- Hybrid work model & extra personal/flex days and paid volunteer days a year for your favorite cause.
- Company sponsored team-bonding events.
- Weekly health & wellness activities (e.g. basketball, football, yoga, running), gym discounts, healthy breakfast, snacks and beverages.
- Entrepreneurial culture and amazing coworkers!