Senior Data Engineer (Python - PySpark and AWS)

Collabera
Toronto, CA; US
On-site

Why this role

Pace
Fast Paced
Collaboration
Medium
Autonomy
High
Decision Impact
Team
Role Level
Individual Contributor

Derived from job-description analysis by Serendipath's career intelligence engine.

What success looks like

  • migrated and optimized data pipelines
  • enhanced data product design
Typical background
data engineeringdistributed data systems

Transferable backgrounds

  • Coming from data architect
  • Coming from data scientist

Skills & requirements

Required

PysparkAWS EMRData Pipeline OptimizationData Storage Formats

Preferred

Capital Markets ExperienceMarket Data Vendor Experience

Stack & domain

PythonPysparkAWSEmr SparkDatabricksParquetIcebergAirflowAws GlueLake FormationParallel/distributed Data ProcessingProblem-solvingIndependent WorkCollaborationCommunicationProject ManagementData Engineering

About the role

Original posting from Collabera

Title: Senior Data Engineer

Client:

Investments Industry

# of Openings:

1

Type:

6-Month Contract (High likelihood of extension)

Location:

Toronto, ON

Work Model:

4 days/week onsite, Friday WFH

PR:

$80-100/hr

Role Overview

  • We are seeking a

Senior Data Engineer (8-10+ years experience)

to support a large-scale data platform transformation within the Total Fund Management (TFM) team.

  • This role will focus on

migrating and modernizing existing Databricks-based pipelines to AWS (EMR Spark)

, with an initial

lift-and-shift phase

, followed by

optimization and redesign into scalable, consumable data products

.

  • This is a

highly autonomous, hands-on role

requiring strong PySpark expertise, deep experience with distributed data systems, and the ability to navigate complex, multi-source datasets (including market and reference data vendors).

Day-to-Day Responsibilities

  • Migrate existing

Databricks-based Spark pipelines to AWS EMR (Spark)

  • Perform

lift-and-shift of ~50+ datasets

, some with high complexity and multiple data sources

  • Refactor and optimize data pipelines for

performance, scalability, and reliability

  • Structure and store data using

Parquet and Iceberg

formats

  • Improve and clean up legacy data pipelines built over several years
  • Design data with a

consumption-first mindset

(e.g., partitioning strategies, access patterns, data usability)

  • Collaborate with stakeholders to understand data requirements and translate into scalable solutions
  • Ensure production readiness including

monitoring, orchestration, and deployment

  • Work independently to drive delivery from design through implementation

Key Responsibilities

  • Develop and optimize

large-scale PySpark data pipelines

  • Rebuild and enhance Spark workloads in

AWS (EMR)

  • Leverage tools such as

Airflow, AWS Glue, and Lake Formation

  • Handle

parallel/distributed data processing workloads

  • Improve system performance and data quality across pipelines
  • Engage with business and technical stakeholders to align on data needs
  • Own delivery with minimal oversight in a fast-paced environment

Must-Haves

  • 8-10+ years of Data Engineering experience

(senior-level profiles only)

  • Strong hands-on expertise in

Python and PySpark

  • Deep experience with

Apache Spark in distributed environments

  • Proven experience working with

large-scale, complex data pipelines

  • Experience with

Databricks

(existing environment)

  • Strong knowledge of

Parquet and Iceberg

data formats

  • Experience with

AWS data ecosystem (EMR preferred)

  • Familiarity with

Airflow, Glue, and Lake Formation

  • Strong understanding of

parallel/distributed data processing

  • Ability to work independently with strong problem-solving skills
  • Experience in ambiguous environments with evolving requirements

Nice-to-Haves

  • Prior experience in

capital markets or investment management

  • Experience working with

market data / reference data vendors

  • Experience designing

data products and consumption layers

  • Exposure to large-scale

data platform migrations or transformations

We may use AI-enabled and/or automated tools to support parts of our recruitment process, including application screening, interview scheduling, and candidate communications. These tools are used to enhance consistency and efficiency. All hiring decisions involve human review and are not based solely on automated processing.

The Company offers a total rewards package in accordance with all applicable federal, provincial, and local laws and requirements. Benefit eligibility and offerings vary based on role, employment status, and work location. For contractor positions, benefits are limited to those entitlements and protections required by applicable law, which may include (as applicable) vacation pay, public holidays, leaves of absence, and other legally mandated benefits or payments.

Source: Collabera careers

Similar roles