Research Engineer (AI Optimization)

Careerforgeon

Hybrid

Job Description

About the position

Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build

an end-to-end platform for developing, training, and deploying AI

systems—designed to take ideas from research to production with less friction.

Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI

combines developer-first software with cost-efficient, large-scale compute.

Teams get the tools they need for experimentation, training, and production

inference, with security, observability, and control built in.

We serve solo researchers, startups, and large enterprises. Lightning AI

operates globally with offices in New York City, San Francisco, Seattle, and

London, and is backed by Coatue, Index Ventures, Bain Capital Ventures, and

Firstminute.

We are seeking a highly skilled Research Engineer to work on optimizing training

and inference workloads on compute accelerators and clusters, through the

Lightning Thunder compiler and the broader PyTorch Lightning ecosystem. This

role sits at the intersection of deep learning research, compiler development,

and large-scale system optimization. You’ll be shaping technology that pushes

the boundaries of model performance and efficiency, creating foundational

software that will impact the entire machine learning ecosystem. You will be

joining the Engineering Team and report to our Tech Lead. This is a hybrid role

based in our New York City, San Francisco, or London office, with an in-office

requirement of two days per week. The salary range for this role is

$180,000-$250,000.

Responsibilities

Develop performance-oriented model optimizations at multiple levels:

Graph-level (e.g., operator fusion, kernel scheduling, memory planning)

Kernel-level (CUDA, Triton, custom operators for specialized hardware)

System-level (distributed training across GPUs/TPUs, inference serving at

scale)

Advance the Thunder compiler by building optimization passes, graph

transformations, and integration hooks to accelerate training and inference

workloads.

Work across the software stack to ensure optimizations are accessible to end

users through clean APIs, automated tooling, and seamless integration with

PyTorch Lightning. Design and implement profiling and debugging tools to

analyze model execution, identify bottlenecks, and guide optimization

strategies.

Collaborate with hardware vendors and ecosystem partners to ensure Thunder

runs efficiently across diverse backends (NVIDIA, AMD, TPU, specialized

accelerators).

Contribute to open-source projects by developing new features, improving

documentation, and supporting community adoption.

Engage with researchers and engineers in the community, providing guidance on

performance tuning and advocating for Thunder as the go-to optimization layer

in ML workflows.

Work cross-functionally with Lightning’s product and engineering teams to

ensure compiler and optimization improvements align with the broader product

vision.

Requirements

Strong expertise with deep learning frameworks such as PyTorch
Hands-on experience with model optimization techniques, including graph-level

optimizations, quantization, pruning, mixed precision, or memory-efficient

training.

Knowledge of distributed systems and parallelism strategies

(data/model/pipeline parallelism, checkpointing, elastic scaling).

Familiarity with software engineering practices: designing APIs, building

robust tooling, testing, CI/CD for performance-sensitive systems.

Excellent collaboration and communication skills, with the ability to partner

across research, engineering, and external contributors.

Bachelor’s degree in Computer Science, Engineering

Nice-to-haves

Experience with CUDA, Triton, or other GPU programming models for developing

custom kernels.

Deep understanding of deep learning compiler internals (IR design, operator

fusion, scheduling, optimization passes) or proven work in

performance-critical software.

Proven track record contributing to open-source projects in ML, HPC, or

compiler domains.

Advanced degree (Master’s or PhD) in machine learning, compilers, or systems

highly preferred.

Benefits

We offer competitive base salaries and equity with a 25% one year cliff and

monthly vesting thereafter. For our international employees, we work with our

EOR to pay you in your local currency and provide equitable benefits across the

globe.

Medical, dental and vision
Life and AD&D insurance
Flexible paid time off including winter closure
Paid family leave benefits
$500 monthly meal reimbursement, including groceries & food delivery services
$500 one time home office stipend
$1,000 annual learning & development stipend
100% Citibike membership (NYC only)
$45/month gym membership
Additional various medical and mental health services

Skills & Requirements

Technical Skills

PyTorch LightningCUDATritoncustom operatorsdistributed traininginference servingmodel optimizationgraph-level optimizationsquantizationpruningmixed precisionmemory-efficient trainingdistributed systemsparallelism strategiessoftware engineering practicesAPI designrobust toolingtestingCI/CDperformance tuningAIMachine LearningDeep Learning

Salary

$180,000 - $250,000

year

Level

mid

Posted

4/9/2026

Apply Now

You will be redirected to Careerforgeon's application portal.