Data Engineer with AI - Remote

Lorven Technologies
Boston, US
Remote

Job Description

I hope you are doing well,

Please share your updated profile if you are interested in the below role.

Our client seeks an Data Engineer + AI for a 12 Months project in Boston, MA. Below is the detailed requirement

Job Title: Data Engineer + AI

Work location : Boston, MA

Duration: 12 Months

Job Summary:

We're looking for a Senior Data Engineer to build and scale our lakehouse and AI data pipelines on Databricks. You'll design robust ETL/ELT, enable feature engineering for ML/LLM use cases, and drive best practices for reliability, performance, and cost.

What you'll do

  • Design, build, and maintain batch/streaming pipelines in Python + PySpark on Databricks (Delta Lake, Autoloader, Structured Streaming).
  • Implement data models (Bronze/Silver/Gold), optimize with partitioning, Z-ORDER, and indexing, and manage reliability (DLT/Jobs, monitoring, alerting).
  • Enable ML/AI: feature engineering, MLflow experiment tracking, model registries, and model/feature serving; support RAG pipelines (embeddings, vector stores).
  • Establish data quality checks (e.g., Great Expectations), lineage, and governance (Unity Catalog, RBAC).
  • Collaborate with Data Science/ML and Product to productionize models and AI workflows; champion CI/CD and IaC.
  • Troubleshoot performance and cost issues; mentor engineers and set coding standards.

Must-have qualifications

  • 6-10+ years in data engineering with a track record of production pipelines.
  • Expert in Python and PySpark (UDFs, Window functions, Spark SQL, Catalyst basics).
  • Deep hands-on Databricks: Delta Lake, Jobs/Workflows, Structured Streaming, SQL Warehouses; practical tuning and cost optimization.
  • Strong SQL and data modeling (dimensional, medallion, CDC).
  • ML/AI enablement experience: MLflow, feature stores, model deployment/monitoring; familiarity with LLM workflows (embeddings, vectorization, prompt/response logging).
  • Cloud proficiency on AWS/Azure/GCP (object storage, IAM, networking).
  • CI/CD (GitHub/GitLab/Azure DevOps), testing (pytest), and observability (logs/metrics).

Nice to have

  • Databricks Delta Live Tables, Unity Catalog automation, Model Serving.
  • Orchestration (Airflow/Databricks Workflows), messaging (Kafka/Kinesis/Event Hubs).
  • Data quality & lineage tools (Great Expectations, OpenLineage).
  • Vector DBs (FAISS, pgvector, Pinecone), RAG frameworks (LangChain/LlamaIndex).
  • IaC (Terraform), security/compliance (PII handling, data masking).
  • Experience interfacing with BI tools (Power BI, Tableau, Databricks SQL).

Skills & Requirements

Technical Skills

PythonPysparkDatabricksDelta lakeMlflowFeature storesModel deploymentMonitoringAwsAzureGcpCi/cdIacTerraformSecurityCompliancePii handlingData maskingBi toolsPower biTableauDatabricks sqlData engineeringAiCloud computing

Employment Type

FULL TIME

Level

senior

Posted

4/21/2026

Apply Now

You will be redirected to Lorven Technologies's application portal.