Senior Data Engineer (Python - PySpark and AWS)

Collabera

Toronto, CA; US

On-site

Why this role

Pace

Fast Paced

Collaboration

Medium

Autonomy

High

Decision Impact

Team

Role Level

Individual Contributor

Derived from job-description analysis by Serendipath's career intelligence engine.

What success looks like

migrated and optimized data pipelines
enhanced data product design

Typical background

data engineeringdistributed data systems

Transferable backgrounds

Coming from data architect
Coming from data scientist

Skills & requirements

Required

PysparkAWS EMRData Pipeline OptimizationData Storage Formats

Preferred

Capital Markets ExperienceMarket Data Vendor Experience

Stack & domain

PythonPysparkAWSEmr SparkDatabricksParquetIcebergAirflowAws GlueLake FormationParallel/distributed Data ProcessingProblem-solvingIndependent WorkCollaborationCommunicationProject ManagementData Engineering

About the role

Original posting from Collabera

Title: Senior Data Engineer
Client:
Investments Industry
# of Openings:
1
Type:
6-Month Contract (High likelihood of extension)
Location:
Toronto, ON
Work Model:
4 days/week onsite, Friday WFH
PR:
$80-100/hr
Role Overview
We are seeking a
Senior Data Engineer (8-10+ years experience)
to support a large-scale data platform transformation within the Total Fund Management (TFM) team.
This role will focus on
migrating and modernizing existing Databricks-based pipelines to AWS (EMR Spark)
, with an initial
lift-and-shift phase
, followed by
optimization and redesign into scalable, consumable data products
.
This is a
highly autonomous, hands-on role
requiring strong PySpark expertise, deep experience with distributed data systems, and the ability to navigate complex, multi-source datasets (including market and reference data vendors).
Day-to-Day Responsibilities
Migrate existing
Databricks-based Spark pipelines to AWS EMR (Spark)
Perform
lift-and-shift of ~50+ datasets
, some with high complexity and multiple data sources
Refactor and optimize data pipelines for
performance, scalability, and reliability
Structure and store data using
Parquet and Iceberg
formats
Improve and clean up legacy data pipelines built over several years
Design data with a
consumption-first mindset
(e.g., partitioning strategies, access patterns, data usability)
Collaborate with stakeholders to understand data requirements and translate into scalable solutions
Ensure production readiness including
monitoring, orchestration, and deployment
Work independently to drive delivery from design through implementation
Key Responsibilities
Develop and optimize
large-scale PySpark data pipelines
Rebuild and enhance Spark workloads in
AWS (EMR)
Leverage tools such as
Airflow, AWS Glue, and Lake Formation
Handle
parallel/distributed data processing workloads
Improve system performance and data quality across pipelines
Engage with business and technical stakeholders to align on data needs
Own delivery with minimal oversight in a fast-paced environment
Must-Haves
8-10+ years of Data Engineering experience
(senior-level profiles only)
Strong hands-on expertise in
Python and PySpark
Deep experience with
Apache Spark in distributed environments
Proven experience working with
large-scale, complex data pipelines
Experience with
Databricks
(existing environment)
Strong knowledge of
Parquet and Iceberg
data formats
Experience with
AWS data ecosystem (EMR preferred)
Familiarity with
Airflow, Glue, and Lake Formation
Strong understanding of
parallel/distributed data processing
Ability to work independently with strong problem-solving skills
Experience in ambiguous environments with evolving requirements
Nice-to-Haves
Prior experience in
capital markets or investment management
Experience working with
market data / reference data vendors
Experience designing
data products and consumption layers
Exposure to large-scale
data platform migrations or transformations
We may use AI-enabled and/or automated tools to support parts of our recruitment process, including application screening, interview scheduling, and candidate communications. These tools are used to enhance consistency and efficiency. All hiring decisions involve human review and are not based solely on automated processing.
The Company offers a total rewards package in accordance with all applicable federal, provincial, and local laws and requirements. Benefit eligibility and offerings vary based on role, employment status, and work location. For contractor positions, benefits are limited to those entitlements and protections required by applicable law, which may include (as applicable) vacation pay, public holidays, leaves of absence, and other legally mandated benefits or payments.

Source: Collabera careers