a { text-decoration: none; color: #464feb; } tr th, tr td { border: 1px solid #e6e6e6; } tr th { background-color: #f5f5f5; }
In the Data Impact & Governance Department, you'll architect and build the data infrastructure that powers cutting-edge AI and machine learning solutions for healthcare. This is more than engineering-it's an opportunity to shape the future of cancer care through responsible AI innovation.
What's in it for you?
- Paid Medical Benefits: MD Anderson covers 100% of medical benefits for employees, plus dental and vision options.
- Generous Paid Time Off (PTO): Vacation, sick leave, and holidays to help you recharge.
- Retirement Plans: Secure your future with robust retirement programs and employer contributions.
- Professional Growth: Access to advanced training, leadership development, and opportunities to work on transformative AI projects.
- Mission-Driven Work: Your expertise will enable AI-driven insights that improve patient outcomes and operational efficiency.
a { text-decoration: none; color: #464feb; } tr th, tr td { border: 1px solid #e6e6e6; } tr th { background-color: #f5f5f5; }
The ideal candidate for the Senior Data Engineer - Healthcare AI position is a highly skilled data engineering professional with deep expertise in building scalable, secure, and high-performance data pipelines for AI/ML applications. They possess a strong understanding of healthcare data standards and compliance requirements, combined with advanced technical proficiency in cloud platforms, orchestration tools, and feature/vector store management. This individual thrives in collaborative environments, demonstrates leadership in mentoring others, and is passionate about enabling responsible AI innovation in healthcare.
Key Attributes of the Ideal Candidate:
- Technical Mastery: Expert in Python, SQL, Spark, and modern data engineering frameworks; proficient in Azure services, IaC tools (Terraform, Bicep), and CI/CD workflows.
- AI/ML Data Expertise: Experienced in designing and managing feature and vector stores, batch and streaming pipelines, and high-throughput data architectures for AI/ML systems.
- Healthcare Data Knowledge: Familiar with HL7, FHIR, DICOM standards and skilled in handling EHR, imaging, and clinical datasets with de-identification and compliance.
- Security & Compliance Focus: Strong understanding of HIPAA/HITRUST requirements and ability to implement encryption, RBAC, and audit logging.
- Leadership & Collaboration: Capable of mentoring team members, driving best practices, and partnering with clinicians, data scientists, and IT teams to deliver impactful solutions.
- Problem-Solving & Innovation: Adept at troubleshooting complex data challenges, optimizing performance, and exploring emerging technologies for scalable AI operations.
- Communication Skills: Able to clearly document processes and present technical concepts to both technical and non-technical audiences.
Key Responsibilities
Build and Scale AI/ML Data Pipelines
- Design, implement, and maintain batch and streaming pipelines for ML training, deployment, inference, and monitoring using Azure, Dataiku, and open-source tools.
Data, Feature, and Vector Store Engineering
- Deploy and manage raw data, feature, and vector stores to enable fast, reliable access for production AI/ML systems.
Automate Infrastructure and Deployments
- Use Infrastructure-as-Code (IaC) and CI/CD workflows to automate deployments, improving reliability and efficiency.
Ensure Data Quality and Trust
- Implement validation, lineage, anomaly detection, and drift monitoring to deliver accurate, compliant data.
Security and Compliance by Design
- Enforce encryption, RBAC, tokenization, and audit logging to ensure HIPAA/HITRUST compliance while enabling scalable AI operations.
Collaborate and Lead
- Partner with data engineers, ML engineers, data scientists, and clinical stakeholders to deliver scalable AI solutions.
- Mentor team members and drive best practices in data engineering.
Own and Operate
- Manage pipelines and infrastructure end-to-end, including monitoring, alerting, incident management, and continuous improvement.
Other Duties
- Perform additional tasks as assigned to support departmental goals.
Required Education: Bachelor's degree.
Preferred Education: Master's Level Degree
Preferred Certification: Must obtain at least one Epic Data Model certification (Clinical, Access, or Revenue) issued by Epic within 180 days of date of entry into job.
Preferred Certification: Any of the following:
Azure Data Engineer Associate (DP-203),
EPIC Cogito Certification,
HIPAA Privacy & Security Certification,
HL7/FHIR Certification.
Required Experience: Five years of relevant information technology experience. May substitute required education with years of related experience on a one-to-one basis. With preferred degree, three years of experience required.
Preferred Experience: Healthcare experience in AI/ML space is a must, two years of in