Derived from job-description analysis by Serendipath's career intelligence engine.
This role involves crafting and overseeing automated testing frameworks for complex data pipelines, particularly those built on Databricks and Spark, ensuring they are robust, scalable, and integrated seamlessly into CI/CD processes. Ideal candidates are experienced in leading QA initiatives and possess a strong background in Python and PySpark.
Original posting from Centraprise via LinkedIn
Databricks Automation Engineer (Python, Pyspark)
Seattle, WA
Fulltime (Permanent)
Job Description:
Core Focus:
- Design and lead test automation frameworks for data platforms, especially Databricks/Spark-based ETL pipelines, ensuring data quality, scalability, and CI/CD integration.
Key Skills:
- Test Automation Architecture (Data Platforms)
- ETL/ELT Testing & Data Validation
- Data Quality, Anomaly & Drift Detection
- CI/CD Test Integration & Release Gates
- Performance & Load Testing (Distributed Systems)
- QA Leadership & Test Strategy
Technical Skills:
- Data Platforms: Databricks, Spark, PySpark, Delta Lake
- Languages: Python, Java, Scala
- Cloud/Data: ADLS Gen2, Lakehouse Architecture
- CI/CD: Azure DevOps, GitHub Actions
- Testing: API, UI, Data Pipeline Testing
- Concepts: CDC, Schema Evolution, Data Lineage, Time Travel
Responsibilities:
- Build scalable automation frameworks for data pipelines
- Implement data validation & quality checks
- Integrate automated testing into CI/CD pipelines
- Perform performance and load testing for Spark jobs
- Create synthetic test data and mocking strategies
- Monitor automation health metrics and dashboards
- Lead QA teams and define best practices
Source: Centraprise careers (LinkedIn)