Data Engineer with Databricks and Spark

Sumeru Solutions
Bellevue; Washington, US
On-site

Job Description

JOB SUMMARY

  • This role builds, and maintains scalable data pipelines and lakehouse infrastructure on Microsoft Azure to support efficient extraction, transformation, and loading of data across batch and real-time workloads. It involves implementing and managing the Medallion Architecture (Bronze Silver Gold) using Azure Data Factory, Databricks-PySpark, and Azure SQL Database and Databricks Unity Catalogue.
  • The role requires ensuring SLA-adherent data quality standards. Success is measured by pipeline reliability, data freshness SLA compliance, and the quality of Gold-layer datasets powering Power BI executive dashboards.
  • The work supports organizational decision-making by delivering trusted, well-governed data to business executives and analytics consumers.

Required Skills:

  • Experience building and optimizing big data pipelines using Azure Data Factory, PySpark, and SQL across structured and semi-structured data sets
  • Hands-on experience implementing Medallion Architecture (Bronze/Silver/Gold)
  • Experience with Delta Lake - ACID transactions, incremental loading, schema evolution, partitioning strategies
  • Experience performing root cause analysis on pipeline failures and data quality issues to resolve SLA breaches and identify platform improvement opportunities

Azure Foundational Services :

  • Working knowledge of: Azure Data Factory (ADF), ADLS Gen2, Azure SQL Database, Azure Blob Storage, Azure Key Vault, Azure Monitor / Log Analytics, Azure Event Hubs, Microsoft Fabric Lakehouse, Azure Active Directory / Entra ID (RBAC, Service Principals)

Programming Languages:

  • Proficiency in Python and PySpark for data transformation, pipeline automation, and large-scale distributed processing; strong SQL skills including window functions, CTEs, and query optimization across relational and lakehouse engines

Data Architecture:

  • Solid understanding of Medallion Architecture, dimensional modeling (Star Schema, SCD Types 1/2/3), and the trade-offs between lakehouse, data warehouse, and data lake patterns

Pipeline Engineering:

  • Ability to build robust ADF pipelines with ForEach, Lookup, Copy Activity, and Data Flows; incremental loading via watermark or CDC; error handling, retry logic, and dead-letter patterns

Data Quality Experience:

  • Experience implementing SLA-based data quality checks (freshness, completeness, row count), monitoring via Azure Monitor and ADF diagnostic logs, and defining data quality agreements with business stakeholders.

DevOps for Data:

  • Experience with Git-based workflows, ADF Git integration, CI/CD pipeline promotion across Dev/Test/Prod using Azure DevOps or GitHub Actions

Reporting Layer Awareness:

  • Understanding of how Gold-layer data feeds Power BI - DirectQuery vs. Import mode trade-offs, dataset refresh patterns, and semantic model collaboration with BI teams
  • Ability to manage work across multiple concurrent pipeline projects, prioritize by business impact, and communicate status clearly to technical and non-technical stakeholders

Good to have skills:

  • Experience with Microsoft Fabric (Lakehouse, Notebooks, OneLake, Fabric Pipelines) - active migration or greenfield project
  • Experience with real-time / streaming workloads using Azure Event Hubs or Structured Streaming in PySpark
  • Experience delivering data platforms for executive-level reporting via Power BI semantic models

About the Company:

Sumeru Solutions

Skills & Requirements

Technical Skills

Azure data factoryPysparkSqlDelta lakeMedallion architectureAzure blob storageAzure sql databaseAzure monitorAzure event hubsPower biAzure active directoryAzure key vaultAzure devopsGithub actionsMicrosoft fabricReal-time streamingStructured streamingPower bi semantic modelsRoot cause analysisSla adherenceData qualityPipeline reliabilityData freshnessSla complianceData quality agreementsGit-based workflowsCi/cd pipeline promotionTechnical and non-technical communicationProject managementStakeholder coordinationUser acceptance testingTest case preparationDefects trackingProject reportsDashboardsMeetings coordinationAction items follow-upTraining and change managementData engineeringBig dataLakehouseData pipelinesData qualityData governanceDevopsReporting

Employment Type

FULL TIME

Level

mid

Posted

4/29/2026

Apply Now

You will be redirected to Sumeru Solutions's application portal.

Sign in and we'll score your resume against this role.