Research Scientist (End-to-End & Multimodal Models)

Black Sesame Technologies (Singapore) Pte Ltd

SG, SG

Job Description

Purpose:

In this role, you will be responsible for the end-to-end design and development of autonomous driving frameworks. You will integrate mainstream perception, prediction, and planning technologies into a unified modeling system, leveraging both vision-only and vision-language modeling paradigms, to support autonomous driving tasks across urban and highway scenarios.

You will play a key role in advancing end-to-end and hybrid architectures, including the exploration of Vision-Language Models (VLMs) to enhance scene understanding, reasoning, and decision-making robustness in complex driving environments.

Responsibilities:

Lead the design and implementation of end-to-end autonomous driving models, including one-stage (sensor-to-control) and two-stage (e.g., perception–planning decoupled) architectures. Define model structures, training pipelines, and optimization strategies for stable and explainable planning outputs.
Drive the development of pure vision-based end-to-end systems, integrating multi-task capabilities such as BEV perception, static and dynamic occupancy inference, trajectory prediction, and planning.
Explore and apply Vision-Language Models (VLMs) to improve high-level scene understanding, semantic reasoning, and cross-modal representation learning for autonomous driving tasks.
Optimize and deploy models on embedded platforms, including inference acceleration, post-processing, system-level integration, performance tuning, stability validation, and on-road testing.
Deliver production-ready solutions for elevated highways and urban driving scenarios, enabling scalable deployment and continuous progression toward higher levels of autonomy.

Qualification/ Requirements:

Ph.D. degree in Computer Science, Artificial Intelligence, Robotics, or a related field.
Strong foundation in autonomous driving systems, with hands-on experience in end-to-end deep learning–based modeling.
Practical experience in planning, control, or decision-making modules using deep learning approaches.
Experience or strong interest in Vision-Language Models (VLMs), multimodal learning, or cross-modal representation learning, particularly in applications involving visual scene understanding and reasoning.
Proficiency in C/C++ and Python, with experience in real-time inference deployment and performance optimization.
Familiarity with BEV-based representations, occupancy prediction, and multi-task learning frameworks.
Experience with system integration and real-vehicle testing is a strong plus.
Strong problem-solving skills, adaptability to complex real-world scenarios, and a results-driven mindset.
Strong mathematical foundation in optimization techniques relevant to computer vision and deep learning.

Skills & Requirements

Technical Skills

PythonC/c++Bev perceptionOccupancy predictionMulti-task learningSystem integrationReal-vehicle testingOptimization techniquesComputer visionDeep learningAutonomous driving

Soft Skills

Problem-solvingAdaptabilityResults-driven mindset

Domain Knowledge

Autonomous drivingAIML

Level

Mid-Level

Posted

5/4/2026

Continue to LinkedIn

You will be redirected to the job posting on LinkedIn.

Find Similar Jobs

Browse roles in the same category, level, and remote setup.