Research Scientist, Reinforcement Learning

Deeproute.ai

Denver, US

On-site

Job Description

We are building next-generation end-to-end autonomous driving systems powered by reinforcement learning.

You will work on applying RL in closed-loop, safety-critical environments

, leveraging large-scale simulation and real-world driving data to improve safety, comfort, and robustness.

Train and deploy RL policies in closed-loop driving environments
Scale RL training using massively parallel simulation systems
Design and optimize reward functions for complex driving behaviors
Improve sim-to-real transfer for real-world robustness
Collaborate with cross-functional teams to integrate models into production systems

Core Technical Skills

Proficiency in modern RL algorithms: DQN, PPO, SAC, TD3, etc.
Proficiency in modern RLHF algorithms: PPO, DPO, GRPO, etc.
Hands‑on experience training reward models and fine tuning LLM/VLM/VLA
Knowledge of distributed RL training at scale
Proficiency with massively parallel simulation environments

Knowledge of sim‑to‑real transfer techniques and domain randomization

Proficiency in Python, comfortable with C++
Proficiency in deep learning frameworks such as Py Torch
Experience with distributed training frameworks (Ray, Horovod, etc.)
Knowledge of model optimization (quantization, pruning) and CUDA is a plus
Knowledge of traffic rules, driving behavior modeling

Preferred Qualifications

Publications in top‑tier venues (ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, ICRA, IROS, etc.)
Open‑source contributions to RL libraries or autonomous driving projects
Previous experience with LLM fine‑tuning using RLHF
Knowledge of safe RL, interpretable AI, or robustness techniques
Familiarity with autonomous vehicle regulations and safety standards

#J-18808-Ljbffr

Skills & Requirements

Technical Skills

Reinforcement learningDqnPpoSacTd3RlhfPpoDpoGrpoReward modelsFine tuning llm/vlm/vlaDistributed rl trainingMassively parallel simulation environmentsTraffic rulesDriving behavior modelingPythonC++PytorchDistributed training frameworksModel optimizationCudaSafe rlInterpretable aiRobustness techniquesAutonomous vehicle regulationsSafety standardsCollaborationProblem solvingCommunicationTeamworkReinforcement learningAutonomous drivingSimulationReal-world robustnessSafetyComfort

Employment Type

FULL TIME

Level

senior

Posted

4/30/2026

Apply Now

You will be redirected to Deeproute.ai's application portal.