Research Scientist, Reinforcement Learning

Deeproute.ai
Denver, US
On-site

Job Description

We are building next-generation end-to-end autonomous driving systems powered by reinforcement learning.

You will work on applying RL in closed-loop, safety-critical environments

, leveraging large-scale simulation and real-world driving data to improve safety, comfort, and robustness.

  • Train and deploy RL policies in closed-loop driving environments
  • Scale RL training using massively parallel simulation systems
  • Design and optimize reward functions for complex driving behaviors
  • Improve sim-to-real transfer for real-world robustness
  • Collaborate with cross-functional teams to integrate models into production systems

Core Technical Skills

  • Proficiency in modern RL algorithms: DQN, PPO, SAC, TD3, etc.
  • Proficiency in modern RLHF algorithms: PPO, DPO, GRPO, etc.
  • Hands‑on experience training reward models and fine tuning LLM/VLM/VLA
  • Knowledge of distributed RL training at scale
  • Proficiency with massively parallel simulation environments

Knowledge of sim‑to‑real transfer techniques and domain randomization

  • Proficiency in Python, comfortable with C++
  • Proficiency in deep learning frameworks such as Py Torch
  • Experience with distributed training frameworks (Ray, Horovod, etc.)
  • Knowledge of model optimization (quantization, pruning) and CUDA is a plus
  • Knowledge of traffic rules, driving behavior modeling

Preferred Qualifications

  • Publications in top‑tier venues (ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, ICRA, IROS, etc.)
  • Open‑source contributions to RL libraries or autonomous driving projects
  • Previous experience with LLM fine‑tuning using RLHF
  • Knowledge of safe RL, interpretable AI, or robustness techniques
  • Familiarity with autonomous vehicle regulations and safety standards

#J-18808-Ljbffr

Skills & Requirements

Technical Skills

Reinforcement learningDqnPpoSacTd3RlhfPpoDpoGrpoReward modelsFine tuning llm/vlm/vlaDistributed rl trainingMassively parallel simulation environmentsTraffic rulesDriving behavior modelingPythonC++PytorchDistributed training frameworksModel optimizationCudaSafe rlInterpretable aiRobustness techniquesAutonomous vehicle regulationsSafety standardsCollaborationProblem solvingCommunicationTeamworkReinforcement learningAutonomous drivingSimulationReal-world robustnessSafetyComfort

Employment Type

FULL TIME

Level

senior

Posted

4/30/2026

Apply Now

You will be redirected to Deeproute.ai's application portal.

Sign in and we'll score your resume against this role.