Job Title: ML Ops Engineer
Location: Burbank, CA (Onsite)
Duration: 12 Months
Job Description:
We are seeking an experienced ML Ops Engineer with deep expertise in building and managing scalable machine learning infrastructure. The ideal candidate will have strong hands-on experience with AWS SageMaker and a proven track record of designing robust ML Ops frameworks.
Key Responsibilities:
- Lead, design, and implement ML Ops infrastructure by building and maintaining scalable, secure, and automated pipelines for model and data deployment across multiple environments.
- Establish and enforce ML Ops best practices aligned with existing infrastructure, architecture patterns, and model deployment requirements.
- Develop and implement observability frameworks, including monitoring, logging, and alerting systems, to ensure high reliability and performance of deployed models and agents.
- Manage the end-to-end ML lifecycle, including feature store management, model registry and governance, evaluation workflows, deployment testing, and inference processes.
- Collaborate closely with data scientists, AWS platform engineers, and cross-functional product/platform teams to integrate ML Ops best practices into development workflows.
- Ensure adherence to security standards, data governance policies, and regulatory compliance across all ML Ops processes.
- Drive continuous improvement initiatives focused on automation, optimization, scalability, and system resilience.
Required Qualifications:
- 10+ years of hands-on experience in ML Ops.
- Strong expertise in AWS SageMaker and related AWS services.
- Experience building and managing CI/CD pipelines for machine learning workflows.
- Proficiency in monitoring and observability tools.
- Solid understanding of model lifecycle management and deployment strategies.
- Strong collaboration and communication skills.
Preferred Qualifications:
- Experience with containerization and orchestration tools (e.g., Docker, Kubernetes).
- Familiarity with data engineering and big data tools.
- Knowledge of security and compliance frameworks in cloud environments.