Testing Automation Engineer

RiDiK (a Subsidiary of CLPS. Nasdaq: CLPS)

Singapore, SG

On-site

Job Description

Experience Requirements

● Total QA/Testing Experience: 5+ years.

● Data Testing Experience: 3+ years specifically in Big Data, Hadoop, or Cloud Data Warehouse environments.

● Good to have : Databricks Experience: 1+ years of experience testing pipelines within a Databricks environment.

● Automation Focus: Proven track record of moving from manual SQL checks to automated Python-based testing frameworks.

● Migration automation test experience using Python

Required Certifications

● Good to have: Databricks Certified Data Engineer Associate (at minimum).

● Preferred: ISTQB Foundation or Advanced Level (Test Automation Engineer).

Core Technical Skills

● Great Expectations / Pandera: Proficiency in using Python-based libraries to define data "contracts" and automated validation suites.

● DLT Expectations: Deep understanding of Delta Live Tables (DLT) expectations (Fail, Drop, Quarantining bad records).

● Advanced SQL: Expert-level SQL for complex data reconciliation, identifying duplicates, and null-value analysis across billions of records.

● Pytest-Spark: Experience using pytest to write unit tests for PySpark transformations and logic.

● Notebook Testing: Ability to write automated test notebooks that validate Medallion Architecture transitions (Bronze to Silver, Silver to Gold).

● Data Reconciliation: Building Python scripts to perform "source-to-target" counts and checksums across distributed file systems.

● Scalability Testing: Ability to validate that data pipelines meet performance SLAs when data volume spikes.

● End-to-End Orchestration Testing: Testing the reliability of Databricks Workflows and handling of job failures/retries.

● Schema Evolution: Testing how pipelines handle upstream schema changes without breaking downstream Gold tables.

● Unity Catalog Validation: Testing Row-Level Security (RLS) and Column-Level Masking to ensure unauthorized users cannot see sensitive data.

● Data Lineage: Validating that data lineage in Unity Catalog correctly reflects the movement of data across the Lakehouse.

Preferred Candidate Background

● "Data-First" Mindset: Understanding that testing a Lakehouse is about testing the data and its behavior, not just the "UI" or "API."

● Software Engineering Foundation: Candidates who know how to use Git (Branching/Merging) to manage their test code alongside the engineering team.

● Distributed Systems Knowledge: Basic understanding of Spark (shuffling, partitioning) to understand why data might be missing or duplicated in a distributed environment.

Key Responsibilities

● Develop Test Strategy: Create a comprehensive test plan for the Lakehouse, focusing on Data Integrity, Accuracy, and Consistency.

● Automate Validation: Replace manual "spot-checking" with automated Python test suites that run as part of the CI/CD pipeline.

● Defect Analysis: Identify and document data anomalies, working closely with Data Engineers to perform root-cause analysis on Spark job failures.

● Regression Testing: Ensure that new PySpark code deployments do not impact existing Gold layer business logic or dashboard reporting.

Job Type: Contract

Contract length: 12 months

Pay: $6,500.00 - $7,500.00 per month

Work Location: In person

Skills & Requirements

Technical Skills

PythonSqlPysparkDatabricksGitData validationAutomationPerformance testingDatabricks certified data engineer associateIstqb foundationBig dataCloud data warehouseLakehouse

Salary

$6,500 - $7,500

month

Employment Type

CONTRACT

Level

mid

Posted

4/28/2026

Continue to Glassdoor

You will be redirected to the job posting on Glassdoor.