Research Scientist (End-to-End & Multimodal Models)

Black Sesame Technologies (Singapore) Pte Ltd
SG, SG

Job Description

Purpose:

In this role, you will be responsible for the end-to-end design and development of autonomous driving frameworks. You will integrate mainstream perception, prediction, and planning technologies into a unified modeling system, leveraging both vision-only and vision-language modeling paradigms, to support autonomous driving tasks across urban and highway scenarios.

You will play a key role in advancing end-to-end and hybrid architectures, including the exploration of Vision-Language Models (VLMs) to enhance scene understanding, reasoning, and decision-making robustness in complex driving environments.

Responsibilities:

  • Lead the design and implementation of end-to-end autonomous driving models, including one-stage (sensor-to-control) and two-stage (e.g., perception–planning decoupled) architectures. Define model structures, training pipelines, and optimization strategies for stable and explainable planning outputs.
  • Drive the development of pure vision-based end-to-end systems, integrating multi-task capabilities such as BEV perception, static and dynamic occupancy inference, trajectory prediction, and planning.
  • Explore and apply Vision-Language Models (VLMs) to improve high-level scene understanding, semantic reasoning, and cross-modal representation learning for autonomous driving tasks.
  • Optimize and deploy models on embedded platforms, including inference acceleration, post-processing, system-level integration, performance tuning, stability validation, and on-road testing.
  • Deliver production-ready solutions for elevated highways and urban driving scenarios, enabling scalable deployment and continuous progression toward higher levels of autonomy.

Qualification/ Requirements:

  • Ph.D. degree in Computer Science, Artificial Intelligence, Robotics, or a related field.
  • Strong foundation in autonomous driving systems, with hands-on experience in end-to-end deep learning–based modeling.
  • Practical experience in planning, control, or decision-making modules using deep learning approaches.
  • Experience or strong interest in Vision-Language Models (VLMs), multimodal learning, or cross-modal representation learning, particularly in applications involving visual scene understanding and reasoning.
  • Proficiency in C/C++ and Python, with experience in real-time inference deployment and performance optimization.
  • Familiarity with BEV-based representations, occupancy prediction, and multi-task learning frameworks.
  • Experience with system integration and real-vehicle testing is a strong plus.
  • Strong problem-solving skills, adaptability to complex real-world scenarios, and a results-driven mindset.
  • Strong mathematical foundation in optimization techniques relevant to computer vision and deep learning.

Skills & Requirements

Technical Skills

PythonC/c++Bev perceptionOccupancy predictionMulti-task learningSystem integrationReal-vehicle testingOptimization techniquesComputer visionDeep learningAutonomous driving

Soft Skills

Problem-solvingAdaptabilityResults-driven mindset

Domain Knowledge

Autonomous drivingAIML

Level

Mid-Level

Posted

5/4/2026

Continue to LinkedIn

You will be redirected to the job posting on LinkedIn.

Sign in and we'll score your resume against this role.

Find Similar Jobs

Browse roles in the same category, level, and remote setup.