About the position
Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build
an end-to-end platform for developing, training, and deploying AI
systems—designed to take ideas from research to production with less friction.
Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI
combines developer-first software with cost-efficient, large-scale compute.
Teams get the tools they need for experimentation, training, and production
inference, with security, observability, and control built in.
We serve solo researchers, startups, and large enterprises. Lightning AI
operates globally with offices in New York City, San Francisco, Seattle, and
London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and
Firstminute.
We are seeking a highly skilled Research Engineer to work on optimizing training
and inference workloads on compute accelerators and clusters, through the
Lightning Thunder compiler and the broader PyTorch Lightning ecosystem. This
role sits at the intersection of deep learning research, compiler development,
and large-scale system optimization. You’ll be shaping technology that pushes
the boundaries of model performance and efficiency, creating foundational
software that will impact the entire machine learning ecosystem. You will be
joining the Engineering Team and report to our Tech Lead. This is a hybrid role
based in our New York City, San Francisco, or London office, with an in-office
requirement of two days per week. The salary range for this role is
$180,000-$250,000.
Responsibilities
Graph-level (e.g., operator fusion, kernel scheduling, memory planning)
Kernel-level (CUDA, Triton, custom operators for specialized hardware)
System-level (distributed training across GPUs/TPUs, inference serving at
scale)
transformations, and integration hooks to accelerate training and inference
workloads.
users through clean APIs, automated tooling, and seamless integration with
PyTorch Lightning. Design and implement profiling and debugging tools to
analyze model execution, identify bottlenecks, and guide optimization
strategies.
runs efficiently across diverse backends (NVIDIA, AMD, TPU, specialized
accelerators).
documentation, and supporting community adoption.
performance tuning and advocating for Thunder as the go-to optimization layer
in ML workflows.
ensure compiler and optimization improvements align with the broader product
vision.
Requirements
optimizations, quantization, pruning, mixed precision, or memory-efficient
training.
(data/model/pipeline parallelism, checkpointing, elastic scaling).
robust tooling, testing, CI/CD for performance-sensitive systems.
across research, engineering, and external contributors.
Nice-to-haves
custom kernels.
fusion, scheduling, optimization passes) or proven work in
performance-critical software.
compiler domains.
highly preferred.
Benefits
monthly vesting thereafter. For our international employees, we work with our
EOR to pay you in your local currency and provide equitable benefits across the
globe.
$180,000 - $250,000
year
mid
4/9/2026
You will be redirected to Careerforgeon's application portal.