Data Engineer with Databricks and Spark

Sumeru Solutions

Bellevue; Washington, US

On-site

Job Description

JOB SUMMARY

This role builds, and maintains scalable data pipelines and lakehouse infrastructure on Microsoft Azure to support efficient extraction, transformation, and loading of data across batch and real-time workloads. It involves implementing and managing the Medallion Architecture (Bronze Silver Gold) using Azure Data Factory, Databricks-PySpark, and Azure SQL Database and Databricks Unity Catalogue.
The role requires ensuring SLA-adherent data quality standards. Success is measured by pipeline reliability, data freshness SLA compliance, and the quality of Gold-layer datasets powering Power BI executive dashboards.
The work supports organizational decision-making by delivering trusted, well-governed data to business executives and analytics consumers.

Required Skills:

Experience building and optimizing big data pipelines using Azure Data Factory, PySpark, and SQL across structured and semi-structured data sets
Hands-on experience implementing Medallion Architecture (Bronze/Silver/Gold)
Experience with Delta Lake - ACID transactions, incremental loading, schema evolution, partitioning strategies
Experience performing root cause analysis on pipeline failures and data quality issues to resolve SLA breaches and identify platform improvement opportunities

Azure Foundational Services :

Working knowledge of: Azure Data Factory (ADF), ADLS Gen2, Azure SQL Database, Azure Blob Storage, Azure Key Vault, Azure Monitor / Log Analytics, Azure Event Hubs, Microsoft Fabric Lakehouse, Azure Active Directory / Entra ID (RBAC, Service Principals)

Programming Languages:

Proficiency in Python and PySpark for data transformation, pipeline automation, and large-scale distributed processing; strong SQL skills including window functions, CTEs, and query optimization across relational and lakehouse engines

Data Architecture:

Solid understanding of Medallion Architecture, dimensional modeling (Star Schema, SCD Types 1/2/3), and the trade-offs between lakehouse, data warehouse, and data lake patterns

Pipeline Engineering:

Ability to build robust ADF pipelines with ForEach, Lookup, Copy Activity, and Data Flows; incremental loading via watermark or CDC; error handling, retry logic, and dead-letter patterns

Data Quality Experience:

Experience implementing SLA-based data quality checks (freshness, completeness, row count), monitoring via Azure Monitor and ADF diagnostic logs, and defining data quality agreements with business stakeholders.

DevOps for Data:

Experience with Git-based workflows, ADF Git integration, CI/CD pipeline promotion across Dev/Test/Prod using Azure DevOps or GitHub Actions

Reporting Layer Awareness:

Understanding of how Gold-layer data feeds Power BI - DirectQuery vs. Import mode trade-offs, dataset refresh patterns, and semantic model collaboration with BI teams
Ability to manage work across multiple concurrent pipeline projects, prioritize by business impact, and communicate status clearly to technical and non-technical stakeholders

Good to have skills:

Experience with Microsoft Fabric (Lakehouse, Notebooks, OneLake, Fabric Pipelines) - active migration or greenfield project
Experience with real-time / streaming workloads using Azure Event Hubs or Structured Streaming in PySpark
Experience delivering data platforms for executive-level reporting via Power BI semantic models

About the Company:

Sumeru Solutions

Skills & Requirements

Technical Skills

Azure data factoryPysparkSqlDelta lakeMedallion architectureAzure blob storageAzure sql databaseAzure monitorAzure event hubsPower biAzure active directoryAzure key vaultAzure devopsGithub actionsMicrosoft fabricReal-time streamingStructured streamingPower bi semantic modelsRoot cause analysisSla adherenceData qualityPipeline reliabilityData freshnessSla complianceData quality agreementsGit-based workflowsCi/cd pipeline promotionTechnical and non-technical communicationProject managementStakeholder coordinationUser acceptance testingTest case preparationDefects trackingProject reportsDashboardsMeetings coordinationAction items follow-upTraining and change managementData engineeringBig dataLakehouseData pipelinesData qualityData governanceDevopsReporting

Employment Type

FULL TIME

Level

mid

Posted

4/29/2026

Apply Now

You will be redirected to Sumeru Solutions's application portal.