Experience Requirements
● Total QA/Testing Experience: 5+ years.
● Data Testing Experience: 3+ years specifically in Big Data, Hadoop, or Cloud Data Warehouse environments.
● Databricks Experience: 1+ years of experience testing pipelines within a Databricks environment.
● Automation Focus: Proven track record of moving from manual SQL checks to automated Python-based testing frameworks.
Required Certifications
● Mandatory: Databricks Certified Data Engineer Associate (at minimum).
● Preferred: ISTQB Foundation or Advanced Level (Test Automation Engineer).
Core Technical Skills
● Great Expectations / Pandera: Proficiency in using Python-based libraries to define data "contracts" and automated validation suites.
● DLT Expectations: Deep understanding of Delta Live Tables (DLT) expectations (Fail, Drop, Quarantining bad records).
● Advanced SQL: Expert-level SQL for complex data reconciliation, identifying duplicates, and null-value analysis across billions of records.
● Pytest-Spark: Experience using pytest to write unit tests for PySpark transformations and logic.
● Notebook Testing: Ability to write automated test notebooks that validate Medallion Architecture transitions (Bronze to Silver, Silver to Gold).
● Data Reconciliation: Building Python scripts to perform "source-to-target" counts and checksums across distributed file systems.
● Scalability Testing: Ability to validate that data pipelines meet performance SLAs when data volume spikes.
● End-to-End Orchestration Testing: Testing the reliability of Databricks Workflows and handling of job failures/retries.
● Schema Evolution: Testing how pipelines handle upstream schema changes without breaking downstream Gold tables.
● Unity Catalog Validation: Testing Row-Level Security (RLS) and Column-Level Masking to ensure unauthorized users cannot see sensitive data.
● Data Lineage: Validating that data lineage in Unity Catalog correctly reflects the movement of data across the Lakehouse.
Preferred Candidate Background
● "Data-First" Mindset: Understanding that testing a Lakehouse is about testing the data and its behavior, not just the "UI" or "API."
● Software Engineering Foundation: Candidates who know how to use Git (Branching/Merging) to manage their test code alongside the engineering team.
● Distributed Systems Knowledge: Basic understanding of Spark (shuffling, partitioning) to understand why data might be missing or duplicated in a distributed environment.
Key Responsibilities
● Develop Test Strategy: Create a comprehensive test plan for the Lakehouse, focusing on Data Integrity, Accuracy, and Consistency.
● Automate Validation: Replace manual "spot-checking" with automated Python test suites that run as part of the CI/CD pipeline.
● Defect Analysis: Identify and document data anomalies, working closely with Data Engineers to perform root-cause analysis on Spark job failures.
● Regression Testing: Ensure that new PySpark code deployments do not impact existing Gold layer business logic or dashboard reporting.
FULL TIME
mid
4/29/2026
You will be redirected to the job posting on LinkedIn.
Sign in and we'll score your resume against this role.