Databricks Automation Engineer (Python, Pyspark) - Only W2 Fulltime

Centraprise
Seattle, US
Career-pivot friendly

Why this role

Pace
Steady
The fast-paced nature of this role is evident from the need to continuously build and maintain scalable automation frameworks, as well as integrate automated testing into CI/CD pipelines, suggesting a dynamic and evolving work environment.
Collaboration
Medium
Collaboration is a key aspect of this role, as evidenced by the requirement to lead QA teams and define best practices, indicating a need for effective communication and teamwork.
Autonomy
Medium
The autonomy in this position is high, as the engineer is responsible for designing and implementing comprehensive test automation frameworks and leading QA teams, requiring independent decision-making and problem-solving skills.
Decision Impact
Individual
Decisions made in this role have a significant impact on the quality and reliability of data pipelines, as the engineer is tasked with ensuring data quality, performance, and scalability, which are critical for the success of the company's data initiatives.
Role Level
Individual Contributor
The complexity of the role is high, given the need to work with advanced technologies such as Databricks, Spark, and PySpark, and to develop strategies for data validation, performance testing, and CI/CD integration.
Career Pivot Friendly
Welcomes transferable skills
Individuals with a background in software development or QA from industries like finance or tech startups, where they have experience with data platforms and automation, can easily transition into this role, leveraging their technical skills and leadership experience.

Derived from job-description analysis by Serendipath's career intelligence engine.

Transferable backgrounds

  • Coming from QA Manager at a tech startup
    leadership in QA · test strategy development
    Experience in leading QA teams and developing test strategies at a tech startup directly translates to the responsibilities of leading QA teams and defining best practices in this role.
  • Coming from ETL Developer at a financial institution
    ETL/ELT testing · data validation
    A background in ETL/ELT testing and data validation at a financial institution equips candidates with the necessary skills to implement data validation and quality checks in Databricks/Spark-based ETL pipelines.

Skills & requirements

Required

Test Automation ArchitectureEtl/elt TestingData ValidationCi/cd IntegrationPerformance Testing

Preferred

Data LineageTime Travel

Stack & domain

PythonPysparkDatabricksSparkDelta LakeJavaScalaADLS Gen2Lakehouse ArchitectureAzure DevopsGithub ActionsAPIUICDCSchema EvolutionData LineageTime Travel

About the role

This role involves crafting and overseeing automated testing frameworks for complex data pipelines, particularly those built on Databricks and Spark, ensuring they are robust, scalable, and integrated seamlessly into CI/CD processes. Ideal candidates are experienced in leading QA initiatives and possess a strong background in Python and PySpark.

Original posting from Centraprise via LinkedIn

Databricks Automation Engineer (Python, Pyspark)

Seattle, WA

Fulltime (Permanent)

Job Description:

Core Focus:

  • Design and lead test automation frameworks for data platforms, especially Databricks/Spark-based ETL pipelines, ensuring data quality, scalability, and CI/CD integration.

Key Skills:

  • Test Automation Architecture (Data Platforms)
  • ETL/ELT Testing & Data Validation
  • Data Quality, Anomaly & Drift Detection
  • CI/CD Test Integration & Release Gates
  • Performance & Load Testing (Distributed Systems)
  • QA Leadership & Test Strategy

Technical Skills:

  • Data Platforms: Databricks, Spark, PySpark, Delta Lake
  • Languages: Python, Java, Scala
  • Cloud/Data: ADLS Gen2, Lakehouse Architecture
  • CI/CD: Azure DevOps, GitHub Actions
  • Testing: API, UI, Data Pipeline Testing
  • Concepts: CDC, Schema Evolution, Data Lineage, Time Travel

Responsibilities:

  • Build scalable automation frameworks for data pipelines
  • Implement data validation & quality checks
  • Integrate automated testing into CI/CD pipelines
  • Perform performance and load testing for Spark jobs
  • Create synthetic test data and mocking strategies
  • Monitor automation health metrics and dashboards
  • Lead QA teams and define best practices

Source: Centraprise careers (LinkedIn)

Similar roles