Data Engineer – AI/ML Data Infrastructure

Yogi Careers
Boston, US

Job Description

Overview:

We’re looking for a Data Engineer to build and maintain the data infrastructure that powers machine learning initiatives. You’ll work at the intersection of software engineering and data science.

Responsibilities:

  • Develop and maintain feature stores and ML-ready datasets
  • Automate data preprocessing pipelines for ML model training and evaluation
  • Collaborate with ML engineers to enable scalable experimentation workflows
  • Monitor and improve data reliability, lineage, and reproducibility

Requirements:

  • BS/MS in Computer Science, Data Engineering, or similar
  • Experience with ML platforms (Databricks, AWS Sagemaker, Vertex AI)
  • Strong Python and SQL skills, with familiarity in Spark or Dask
  • Experience with Airflow, MLflow, or Kubeflow pipelines
  • Solid understanding of MLOps, data validation, and model versioning

Job Category: Data Engineer

Job Type: Full Time

Job Location: Boston

Apply for this position

Full Name *

Email *

Phone *

Cover Letter *

Upload CV/Resume *Allowed Type(s): .pdf, .doc, .docx

By using this form you agree with the storage and handling of your data by this website. *

Responsibilities:

  • Develop and maintain feature stores and ML-ready datasets
  • Automate data preprocessing pipelines for ML model training and evaluation
  • Collaborate with ML engineers to enable scalable experimentation workflows
  • Monitor and improve data reliability, lineage, and reproducibility

Requirements:

  • BS/MS in Computer Science, Data Engineering, or similar
  • Experience with ML platforms (Databricks, AWS Sagemaker, Vertex AI)
  • Strong Python and SQL skills, with familiarity in Spark or Dask
  • Experience with Airflow, MLflow, or Kubeflow pipelines
  • Solid understanding of MLOps, data validation, and model versioning

Job Category: Data Engineer

Job Type: Full Time

Job Location: Boston

Responsibilities:

  • Develop and maintain feature stores and ML-ready datasets
  • Automate data preprocessing pipelines for ML model training and evaluation
  • Collaborate with ML engineers to enable scalable experimentation workflows
  • Monitor and improve data reliability, lineage, and reproducibility

Requirements:

  • BS/MS in Computer Science, Data Engineering, or similar
  • Experience with ML platforms (Databricks, AWS Sagemaker, Vertex AI)
  • Strong Python and SQL skills, with familiarity in Spark or Dask
  • Experience with Airflow, MLflow, or Kubeflow pipelines
  • Solid understanding of MLOps, data validation, and model versioning

Skills & Requirements

Technical Skills

PythonSqlSparkDaskAirflowMlflowKubeflowMlopsData validationModel versioningData engineeringMachine learningData science

Employment Type

FULL TIME

Level

Mid-Level

Posted

4/24/2026

Continue to LinkedIn

You will be redirected to the job posting on LinkedIn.