Senior Machine Learning Engineer – Inference Systems

Fintal Partners
Chicago, US
On-site

Why this role

Pace
Fast Paced
Collaboration
High
Autonomy
Medium
Decision Impact
Team
Role Level
Team Lead

Derived from job-description analysis by Serendipath's career intelligence engine.

What success looks like

  • Built and scaled distributed training systems
  • Optimized PyTorch-based training and inference performance
Typical background
Computer Science, Mathematics, Physics, or related technical degree

Transferable backgrounds

  • Coming from Data Scientist
  • Coming from AI Researcher

Skills & requirements

Required

PyTorchDistributed ComputingGPU ClustersKubernetesRaySlurmNCCLTriton

Preferred

High-performance InfrastructureNetworkingMemory ManagementStorage Optimization

Stack & domain

PyTorchKubernetesRaySlurmNcclTritonPythonGpu InfrastructurePerformance OptimizationReliability EngineeringLeadershipCommunicationMachine LearningDistributed ComputingHigh-performance InfrastructureAi ResearchProduction Systems

About the role

Original posting from Fintal Partners via LinkedIn

A leading high-frequency trading firm is building out a world-class machine learning platform team focused on large-scale model training and ultra-low latency inference. This team owns the infrastructure powering next-generation AI research and production systems across the business.

They are looking for Senior Machine Learning Engineers with deep experience building and scaling distributed training and inference systems for large models. The role sits at the intersection of ML systems, distributed computing, and high-performance infrastructure.

You will design and optimize large-scale PyTorch training pipelines, improve GPU cluster utilization, and build highly reliable inference infrastructure capable of operating at massive scale and extremely low latency. The environment is highly technical, fast-paced, and engineering-driven.

Key responsibilities:

  • Build and scale distributed training systems for large deep learning models
  • Optimize PyTorch-based training and inference performance across GPU clusters
  • Design high-throughput, low-latency inference infrastructure for production workloads
  • Improve scheduling, orchestration, checkpointing, and data pipeline efficiency
  • Work closely with researchers and infrastructure engineers to productionize models
  • Drive performance improvements across networking, memory, storage, and compute layers

Requirements:

  • Strong experience with large-scale ML systems and distributed training
  • Deep expertise in PyTorch and modern deep learning infrastructure
  • Experience with technologies such as Kubernetes, Ray, Slurm, NCCL, Triton, or similar
  • Strong Python engineering skills with solid systems knowledge
  • Experience scaling GPU infrastructure in production environments
  • Background in performance optimization and reliability engineering
  • Computer Science, Mathematics, Physics, or related technical degree preferred

The firm offers exceptional compensation, access to cutting-edge compute infrastructure, and the opportunity to work alongside some of the strongest engineers and researchers in the industry.

Source: Fintal Partners careers (LinkedIn)

Similar roles