Senior Data Engineer - Healthcare AI

MD Anderson
Houston, US
On-site

Job Description

a { text-decoration: none; color: #464feb; } tr th, tr td { border: 1px solid #e6e6e6; } tr th { background-color: #f5f5f5; }

In the Data Impact & Governance Department, you'll architect and build the data infrastructure that powers cutting-edge AI and machine learning solutions for healthcare. This is more than engineering-it's an opportunity to shape the future of cancer care through responsible AI innovation.

What's in it for you?

  • Paid Medical Benefits: MD Anderson covers 100% of medical benefits for employees, plus dental and vision options.
  • Generous Paid Time Off (PTO): Vacation, sick leave, and holidays to help you recharge.
  • Retirement Plans: Secure your future with robust retirement programs and employer contributions.
  • Professional Growth: Access to advanced training, leadership development, and opportunities to work on transformative AI projects.
  • Mission-Driven Work: Your expertise will enable AI-driven insights that improve patient outcomes and operational efficiency.

a { text-decoration: none; color: #464feb; } tr th, tr td { border: 1px solid #e6e6e6; } tr th { background-color: #f5f5f5; }

The ideal candidate for the Senior Data Engineer - Healthcare AI position is a highly skilled data engineering professional with deep expertise in building scalable, secure, and high-performance data pipelines for AI/ML applications. They possess a strong understanding of healthcare data standards and compliance requirements, combined with advanced technical proficiency in cloud platforms, orchestration tools, and feature/vector store management. This individual thrives in collaborative environments, demonstrates leadership in mentoring others, and is passionate about enabling responsible AI innovation in healthcare.

Key Attributes of the Ideal Candidate:

  • Technical Mastery: Expert in Python, SQL, Spark, and modern data engineering frameworks; proficient in Azure services, IaC tools (Terraform, Bicep), and CI/CD workflows.
  • AI/ML Data Expertise: Experienced in designing and managing feature and vector stores, batch and streaming pipelines, and high-throughput data architectures for AI/ML systems.
  • Healthcare Data Knowledge: Familiar with HL7, FHIR, DICOM standards and skilled in handling EHR, imaging, and clinical datasets with de-identification and compliance.
  • Security & Compliance Focus: Strong understanding of HIPAA/HITRUST requirements and ability to implement encryption, RBAC, and audit logging.
  • Leadership & Collaboration: Capable of mentoring team members, driving best practices, and partnering with clinicians, data scientists, and IT teams to deliver impactful solutions.
  • Problem-Solving & Innovation: Adept at troubleshooting complex data challenges, optimizing performance, and exploring emerging technologies for scalable AI operations.
  • Communication Skills: Able to clearly document processes and present technical concepts to both technical and non-technical audiences.

Key Responsibilities

Build and Scale AI/ML Data Pipelines

  • Design, implement, and maintain batch and streaming pipelines for ML training, deployment, inference, and monitoring using Azure, Dataiku, and open-source tools.

Data, Feature, and Vector Store Engineering

  • Deploy and manage raw data, feature, and vector stores to enable fast, reliable access for production AI/ML systems.

Automate Infrastructure and Deployments

  • Use Infrastructure-as-Code (IaC) and CI/CD workflows to automate deployments, improving reliability and efficiency.

Ensure Data Quality and Trust

  • Implement validation, lineage, anomaly detection, and drift monitoring to deliver accurate, compliant data.

Security and Compliance by Design

  • Enforce encryption, RBAC, tokenization, and audit logging to ensure HIPAA/HITRUST compliance while enabling scalable AI operations.

Collaborate and Lead

  • Partner with data engineers, ML engineers, data scientists, and clinical stakeholders to deliver scalable AI solutions.
  • Mentor team members and drive best practices in data engineering.

Own and Operate

  • Manage pipelines and infrastructure end-to-end, including monitoring, alerting, incident management, and continuous improvement.

Other Duties

  • Perform additional tasks as assigned to support departmental goals.

Required Education: Bachelor's degree.

Preferred Education: Master's Level Degree

Preferred Certification: Must obtain at least one Epic Data Model certification (Clinical, Access, or Revenue) issued by Epic within 180 days of date of entry into job.

Preferred Certification: Any of the following:

Azure Data Engineer Associate (DP-203),

EPIC Cogito Certification,

HIPAA Privacy & Security Certification,

HL7/FHIR Certification.

Required Experience: Five years of relevant information technology experience. May substitute required education with years of related experience on a one-to-one basis. With preferred degree, three years of experience required.

Preferred Experience: Healthcare experience in AI/ML space is a must, two years of in

Skills & Requirements

Technical Skills

PythonSqlSparkAzure servicesTerraformBicepCi/cd workflowsHl7FhirDicomHipaa/hitrustEncryptionRbacAudit loggingLeadershipCollaborationProblem-solvingInnovationCommunicationAzure data engineer associateEpic cogito certificationHipaa privacy & security certificationHl7/fhir certificationHealthcareAiMachine learning

Employment Type

FULL TIME

Level

senior

Posted

4/8/2026

Apply Now

You will be redirected to MD Anderson's application portal.