Senior Machine Learning Engineer – Inference Systems

Fintal Partners

Chicago, US

On-site

Why this role

Pace

Fast Paced

Collaboration

High

Autonomy

Medium

Decision Impact

Team

Role Level

Team Lead

Derived from job-description analysis by Serendipath's career intelligence engine.

What success looks like

Built and scaled distributed training systems
Optimized PyTorch-based training and inference performance

Typical background

Computer Science, Mathematics, Physics, or related technical degree

Transferable backgrounds

Coming from Data Scientist
Coming from AI Researcher

Skills & requirements

Required

PyTorchDistributed ComputingGPU ClustersKubernetesRaySlurmNCCLTriton

Preferred

High-performance InfrastructureNetworkingMemory ManagementStorage Optimization

Stack & domain

PyTorchKubernetesRaySlurmNcclTritonPythonGpu InfrastructurePerformance OptimizationReliability EngineeringLeadershipCommunicationMachine LearningDistributed ComputingHigh-performance InfrastructureAi ResearchProduction Systems

About the role

Original posting from Fintal Partners via LinkedIn

A leading high-frequency trading firm is building out a world-class machine learning platform team focused on large-scale model training and ultra-low latency inference. This team owns the infrastructure powering next-generation AI research and production systems across the business.
They are looking for Senior Machine Learning Engineers with deep experience building and scaling distributed training and inference systems for large models. The role sits at the intersection of ML systems, distributed computing, and high-performance infrastructure.
You will design and optimize large-scale PyTorch training pipelines, improve GPU cluster utilization, and build highly reliable inference infrastructure capable of operating at massive scale and extremely low latency. The environment is highly technical, fast-paced, and engineering-driven.
Key responsibilities:
Build and scale distributed training systems for large deep learning models
Optimize PyTorch-based training and inference performance across GPU clusters
Design high-throughput, low-latency inference infrastructure for production workloads
Improve scheduling, orchestration, checkpointing, and data pipeline efficiency
Work closely with researchers and infrastructure engineers to productionize models
Drive performance improvements across networking, memory, storage, and compute layers
Requirements:
Strong experience with large-scale ML systems and distributed training
Deep expertise in PyTorch and modern deep learning infrastructure
Experience with technologies such as Kubernetes, Ray, Slurm, NCCL, Triton, or similar
Strong Python engineering skills with solid systems knowledge
Experience scaling GPU infrastructure in production environments
Background in performance optimization and reliability engineering
Computer Science, Mathematics, Physics, or related technical degree preferred
The firm offers exceptional compensation, access to cutting-edge compute infrastructure, and the opportunity to work alongside some of the strongest engineers and researchers in the industry.

Source: Fintal Partners careers (LinkedIn)