Databricks Automation Engineer (Python, Pyspark) - Only W2 Fulltime

Centraprise

Seattle, US

Career-pivot friendly

Why this role

Pace

Steady

The fast-paced nature of this role is evident from the need to continuously build and maintain scalable automation frameworks, as well as integrate automated testing into CI/CD pipelines, suggesting a dynamic and evolving work environment.

Collaboration

Medium

Collaboration is a key aspect of this role, as evidenced by the requirement to lead QA teams and define best practices, indicating a need for effective communication and teamwork.

Autonomy

Medium

The autonomy in this position is high, as the engineer is responsible for designing and implementing comprehensive test automation frameworks and leading QA teams, requiring independent decision-making and problem-solving skills.

Decision Impact

Individual

Decisions made in this role have a significant impact on the quality and reliability of data pipelines, as the engineer is tasked with ensuring data quality, performance, and scalability, which are critical for the success of the company's data initiatives.

Role Level

Individual Contributor

The complexity of the role is high, given the need to work with advanced technologies such as Databricks, Spark, and PySpark, and to develop strategies for data validation, performance testing, and CI/CD integration.

Career Pivot Friendly

Welcomes transferable skills

Individuals with a background in software development or QA from industries like finance or tech startups, where they have experience with data platforms and automation, can easily transition into this role, leveraging their technical skills and leadership experience.

Derived from job-description analysis by Serendipath's career intelligence engine.

Transferable backgrounds

Coming from QA Manager at a tech startup
leadership in QA · test strategy development
Experience in leading QA teams and developing test strategies at a tech startup directly translates to the responsibilities of leading QA teams and defining best practices in this role.
Coming from ETL Developer at a financial institution
ETL/ELT testing · data validation
A background in ETL/ELT testing and data validation at a financial institution equips candidates with the necessary skills to implement data validation and quality checks in Databricks/Spark-based ETL pipelines.

Skills & requirements

Required

Test Automation ArchitectureEtl/elt TestingData ValidationCi/cd IntegrationPerformance Testing

Preferred

Data LineageTime Travel

Stack & domain

PythonPysparkDatabricksSparkDelta LakeJavaScalaADLS Gen2Lakehouse ArchitectureAzure DevopsGithub ActionsAPIUICDCSchema EvolutionData LineageTime Travel

About the role

This role involves crafting and overseeing automated testing frameworks for complex data pipelines, particularly those built on Databricks and Spark, ensuring they are robust, scalable, and integrated seamlessly into CI/CD processes. Ideal candidates are experienced in leading QA initiatives and possess a strong background in Python and PySpark.

Original posting from Centraprise via LinkedIn

Databricks Automation Engineer (Python, Pyspark)
Seattle, WA
Fulltime (Permanent)
Job Description:
Core Focus:
Design and lead test automation frameworks for data platforms, especially Databricks/Spark-based ETL pipelines, ensuring data quality, scalability, and CI/CD integration.
Key Skills:
Test Automation Architecture (Data Platforms)
ETL/ELT Testing & Data Validation
Data Quality, Anomaly & Drift Detection
CI/CD Test Integration & Release Gates
Performance & Load Testing (Distributed Systems)
QA Leadership & Test Strategy
Technical Skills:
Data Platforms: Databricks, Spark, PySpark, Delta Lake
Languages: Python, Java, Scala
Cloud/Data: ADLS Gen2, Lakehouse Architecture
CI/CD: Azure DevOps, GitHub Actions
Testing: API, UI, Data Pipeline Testing
Concepts: CDC, Schema Evolution, Data Lineage, Time Travel
Responsibilities:
Build scalable automation frameworks for data pipelines
Implement data validation & quality checks
Integrate automated testing into CI/CD pipelines
Perform performance and load testing for Spark jobs
Create synthetic test data and mocking strategies
Monitor automation health metrics and dashboards
Lead QA teams and define best practices

Source: Centraprise careers (LinkedIn)