Senior Machine Learning Engineer - Agentic AI

MD Anderson

Houston, US

Visa Sponsorship

Job Description

As a Senior Machine Learning Engineer - Agentic AI within Data Impact & Governance, you will be at the forefront of designing and operating the platform capabilities that enable autonomous and semi-autonomous AI systems to function reliably across clinical, research, and operational domains.

This role offers a rare opportunity to build enterprise-wide agentic AI platforms in a regulated healthcare environment-where correctness, safety, governance, and auditability matter as much as innovation and scale. You will influence technical standards, platform architecture, and operational safeguards that shape how agentic AI is adopted across one of the world's leading cancer centers.

What's in it for you?

Outstanding Benefits: MD Anderson offers paid medical benefits, generous paid time off (PTO), and strong retirement plans, providing stability and long-term financial security.
Enterprise-Level Impact: Architect platform capabilities that support AI agents operating across complex health IT systems and enterprise workflows.
Technical Leadership: Shape standards, integration patterns, and guardrails governing agentic AI at organizational scale.
Career Growth & Visibility: Partner closely with enterprise architects, applied MLEs, data scientists, IT, and governance leaders on high-impact AI initiatives.
Responsible AI Innovation: Work in a mission-driven institution where responsible AI, safety, and trust are central to technology strategy.
Collaborative Culture: Join a highly skilled team that values intellectual rigor, mentorship, and cross-disciplinary collaboration.
**The ideal candidate will have a healthcare background with at least 5 years of industry experience in data science and 3+ years as a Senior ML Engineer focused agentic AI systems***

Summary

The Senior Machine Learning Engineer - Agentic AI designs, evolves, and operates enterprise-scale agentic AI platform capabilities that enable safe, scalable, and governed deployment of autonomous and semi-autonomous AI systems. The role focuses on platform architecture, interoperability, validation frameworks, and operational safeguards that allow internal and third-party agent systems to function reliably in production healthcare environments.

This position operates at the intersection of autonomous AI behavior, enterprise systems integration, and regulated healthcare operations-where subtle failures can have systemic and high-impact consequences.

Major Work Activities

Core Responsibilities

Lead the design, evolution, and operation of the enterprise agentic AI platform in collaboration with enterprise architects and platform ML engineers.
Build platform components that enable interoperability between first-party and third-party agents, including identity, state, memory, tool access, orchestration, auditability, and policy enforcement.
Define and document standardized integration patterns connecting agents with enterprise business systems, data platforms, APIs, and health IT systems.
Provide reusable platform services, reference implementations, and SDKs that reduce risk and accelerate delivery for applied teams.
Design and operate validation and de-risking frameworks, including simulation, sandboxing, shadow execution, canary releases, and continuous behavior monitoring.
Establish and enforce platform standards for agent development, including interfaces, execution contracts, evaluation hooks, safety constraints, and observability requirements.
Participate in platform governance, release coordination, and incident response, supporting investigation and remediation of agent-related failures.
Implement platform safeguards such as fallback mechanisms, rollback strategies, approval gates, rate limiting, audit trails, and kill-switch capabilities.
Partner with software engineering, security, IT, and health IT stakeholders to deploy agentic AI capabilities in secure enterprise environments.
Support responsible AI practices through traceability of prompts, policies, tools, models, agent actions, and documentation of known failure modes and limitations.

Competencies

Technical Expertise

Experience building AI or ML platforms that serve multiple downstream teams and production workloads.
Strong proficiency in Python and integration of modern ML frameworks (e.g., PyTorch) with large language models and agent systems.
Hands-on experience with agentic AI frameworks such as LangGraph, LangChain, AutoGen, CrewAI, Semantic Kernel, or equivalent.
Working knowledge of agentic AI protocols and interoperability standards (e.g., MCP, agent-to-agent communication, structured tool invocation).
Experience implementing planner-executor loops, hierarchical agents, and multi-agent coordination patterns.
Familiarity with workflow orchestration tools (Airflow, Prefect, Temporal) and distributed execution frameworks (Ray or equivalent).
Experience deploying containerized AI platforms using Kubernetes in enterprise cloud environments

Skills & Requirements

Technical Skills

machine learningpythonjavac++tensorflowpytorchkubernetesairflowprefecttemporalrayproblem solvingteamworkcommunicationleadershipadaptabilityhealthcaredata sciencemachine learningartificial intelligence

Level

senior

Posted

3/17/2026

Apply Now

You will be redirected to MD Anderson's application portal.