We are building next-generation end-to-end autonomous driving systems powered by reinforcement learning.
You will work on applying RL in closed-loop, safety-critical environments
, leveraging large-scale simulation and real-world driving data to improve safety, comfort, and robustness.
- Train and deploy RL policies in closed-loop driving environments
- Scale RL training using massively parallel simulation systems
- Design and optimize reward functions for complex driving behaviors
- Improve sim-to-real transfer for real-world robustness
- Collaborate with cross-functional teams to integrate models into production systems
Core Technical Skills
- Proficiency in modern RL algorithms: DQN, PPO, SAC, TD3, etc.
- Proficiency in modern RLHF algorithms: PPO, DPO, GRPO, etc.
- Hands‑on experience training reward models and fine tuning LLM/VLM/VLA
- Knowledge of distributed RL training at scale
- Proficiency with massively parallel simulation environments
Knowledge of sim‑to‑real transfer techniques and domain randomization
- Proficiency in Python, comfortable with C++
- Proficiency in deep learning frameworks such as Py Torch
- Experience with distributed training frameworks (Ray, Horovod, etc.)
- Knowledge of model optimization (quantization, pruning) and CUDA is a plus
- Knowledge of traffic rules, driving behavior modeling
Preferred Qualifications
- Publications in top‑tier venues (ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, ICRA, IROS, etc.)
- Open‑source contributions to RL libraries or autonomous driving projects
- Previous experience with LLM fine‑tuning using RLHF
- Knowledge of safe RL, interpretable AI, or robustness techniques
- Familiarity with autonomous vehicle regulations and safety standards
#J-18808-Ljbffr