Who We Are
Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.
Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.
We serve solo researchers, startups, and large enterprises. Lightning AI operates globally with offices in New York City, San Francisco, Seattle, and London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and Firstminute.
Our Values
•
Move Fast: We act with speed and precision, breaking down big challenges into achievable steps.
•
Focus: We complete one goal at a time with care, collaborating as a team to deliver features with precision.
•
Balance: Sustained performance comes from rest and recovery. We ensure a healthy work-life balance to keep you at your best.
•
Craftsmanship: Innovation through excellence. Every detail matters, and we take pride in mastering our craft.
•
Minimal: Simplicity drives our innovation. We eliminate complexity through discipline and focus on what truly matters.
What We're Looking For
As a Machine Learning Solutions Engineer, you will operate at the intersection of machine learning, distributed systems, and cloud infrastructure.
You will partner with customers to design and deploy end-to-end AI systems, spanning:
This role goes beyond traditional ML solutions engineering—you will act as a technical architect, helping customers make critical decisions across compute, orchestration, and system design.
The role can be based out of our New York City or San Francisco office, with an in-office requirement of at least 2-3 days per week and occasional team and company offsites. We are not able to provide visa sponsorship for this role at this time. The annual base pay range for this role is $150,000 - $195,000, in addition to a variable pay component and meaningful equity.
What You'll Do
Customer Architecture & Technical Leadership
Partner with customers to understand ML workloads, infrastructure constraints, and scaling requirements
Architect end-to-end solutions across:
•
Distributed training (multi-node, multi-GPU) High-throughput inference systems
Translate business goals (latency, cost, throughput) into technical system design decisions
GPU & Infrastructure Design
Design and optimize workloads across GPU clusters (H100, H200, B200, etc.)
Advise on: Training vs inference cluster design
•
Interconnect choices (Ethernet vs Infiniband / RDMA vs Roce) Storage strategies (local NVMe vs networked / object storage)
Model and optimize for: Tokens/sec, tokens/$
•
Throughput vs latency tradeoffs GPU utilization and scheduling efficiency
Kubernetes & Platform Systems
Design and support deployments on Kubernetes (EKS, GKE, on-prem clusters)
Work with: GPU scheduling (time-slicing, MIG, bin-packing)
•
Autoscaling and workload orchestration Helm-based deployments and multi-tenant environments
Help customers balance: Raw Kubernetes flexibility vs platform abstraction (Lightning)
•
Demos, POCs, and Execution
Build and deliver technical demos and POCs that showcase:
Distributed training workflows
•
Scalable inference endpoints End-to-end ML pipelines on Lightning AI
Scope and lead POCs aligned to customer success metrics (latency, cost, reliability)
Cross-Functional Impact
Act as the bridge between customers, product, and engineering
Provide feedback on: Platform gaps in infrastructure, orchestration, and performance
•
Emerging patterns in GPU usage and distributed systems
Influence roadmap across ML workflows and infrastructure capabilities
Enablement & Thought Leadership
What You'll Need
ML + Systems Expertise
3–6+ years experience in:
Machine Learning / AI Engineering
•
Solutions Engineering / Sales Engineering / ML Consulting
Strong understanding of: Training vs inference workloads
•
Model optimization (quantization, batching, caching, etc.)
GPU & Distributed Systems
Experience working with:
GPU clusters (NVIDIA stack preferred)
•
Distributed training or inference systems
Familiarity with: NCCL, CUDA, or GPU performance profiling
•
Networking concepts (RDMA, Roce, Infiniband, high-throughput systems)
Kubernetes & Cloud Platforms
Hands-on experience with:
•
Containerization (Docker)
Exposure to: GPU sche
$150,000 - $195,000
year
FULL TIME
mid
5/4/2026
You will be redirected to the job posting on Indeed.
Sign in and we'll score your resume against this role.
Browse roles in the same category, level, and remote setup.
Sign in to open the target role workbench.