Senior / Staff Data Engineer — Data Processing & Platform

Index Exchange

Washington, US

Remote

Job Description

About the position

At Index Exchange, we’re reinventing how digital advertising works—at scale. As a global advertising supply-side platform, we empower the world’s leading media owners and marketers to thrive in a programmatic, privacy-first ecosystem. We’re a proud industry pioneer with over 20 years of experience accelerating the ad technology evolution. Our proprietary tech is trusted by some of the world’s largest brands and media owners and plays a crucial role in keeping the internet open, accessible, and largely free. We process more than 550 billion real-time auctions every day with ultra-low latency. Our platform is vertically integrated from servers to networks and runs primarily on our own metal and cloud infrastructure. This end-to-end infrastructure is designed to provide both stability and agility, enabling us to adapt quickly as the market evolves. At the core of it all is our engineering-first culture. Our engineers tackle internet-scale problems across tight-knit, global teams. From moving petabytes of data and optimizing with AI to making real-time infrastructure decisions, Indexers have the agency and influence to shape the future of advertising. We move fast, build thoughtfully, and stay grounded in our core values.\n\nWe are hiring a Senior / Staff Data Engineer to build and evolve the data processing and pipeline layer that powers reporting, billing systems, and real-time data products at Index Exchange. This role focuses on designing and operating large-scale batch and streaming data pipelines, enabling reliable, scalable, and efficient data transformation across the platform. You will work on systems that transform raw, high-volume event data into clean, queryable, and production-grade datasets, supporting both API-driven data products and analytical workflows. You will work on high-scale data systems that:\nProcess billions of events per day across distributed pipelines\nPower core business datasets (reporting, billing, marketplace metrics)\nOperate across batch (Spark) and streaming (Kafka / Flink) architectures\nRequire careful balancing of:\ndata correctness\nprocessing efficiency\nlatency vs cost trade-offs\n\nYou will solve problems such as:\nDesigning pipelines that scale without exploding compute costs\nManaging data correctness at scale (deduplication, late data, joins)\nBuilding systems that support both:\nhistorical backfills\nnear real-time updates\nEvolving pipelines from centralized processing (Hadoop) toward more distributed and efficient patterns\nStreaming pipelines and Streaming DWs.

Responsibilities

Design and operate pipelines using Spark (primary) and Kafka / Flink (streaming).
Transform raw event data into cleaned datasets (silver layer) and business-ready datasets (gold / reporting tables).
Build and maintain canonical datasets (aggregated datasets, reporting tables).
Define data contracts and ensure consistency across pipelines.
Support evolving use cases: reporting, billing, ML / experimentation.
Build and maintain Airflow DAGs for pipeline scheduling, dependency management, and backfills.
Improve reliability and observability of workflows.
Optimize pipelines for performance (runtime, throughput), cost (compute efficiency), and scalability (data growth).
Improve partitioning strategies, data layout, and job execution patterns.
Build pipelines that support incremental updates, streaming transformations, and aggregation at scale.
Contribute to evolving patterns such as edge aggregation, streaming → batch convergence, and real-time data availability.
Define and evolve data processing patterns: batch vs streaming, aggregation strategies, incremental vs full recompute.
Work across Spark (core processing), Kafka (transport), Flink (streaming compute), and storage systems (Hadoop / Ceph).
Contribute to data platform architecture decisions, pipeline standardization, and reusable data processing frameworks.
Influence trade-offs: latency vs cost, correctness vs performance, compute vs storage.

Requirements

Strong experience in data engineering at scale
Deep expertise in Spark (required)
Deep expertise in SQL and data modeling
Experience with Airflow or workflow orchestration
Experience with Kafka or streaming systems
Strong understanding of distributed data processing
Strong understanding of data modeling (large-scale datasets)
Strong understanding of performance optimization
Ability to own pipelines end-to-end
Ability to debug complex data issues
Ability to work in high-scale, evolving environments

Nice-to-haves

Define data processing standards and patterns across teams
Lead large-scale pipeline and platform initiatives
Influence data architecture and modeling decisions
Drive improvements across reliability, cost efficiency, and scalability

Benefits

Comprehensive health, dental, and vision plans for you and your dependents
Paid time off, health days, and personal obligation days plus flexible work schedu

Skills & Requirements

Technical Skills

SparkKafkaFlink

Employment Type

FULL TIME

Level

senior

Posted

5/6/2026

Apply Now

You will be redirected to Index Exchange's application portal.

Find Similar Jobs

Browse roles in the same category, level, and remote setup.