AI Research Engineer:
Vision AI / VLM / Physical AI
Company:
Centific
Location:
Seattle, WA (or Remote)
Type:
Full-time
Build the Future of Perception & Embodied Intelligence
Are you pushing the frontier of computer vision, multimodal large models, and embodied/physical AIand have the publications to show it? Join us to translate cutting-edge research into production systems that perceive, reason, and act in the real world.
The Mission
We are building state-of-the-art Vision AI across 2D/3D perception, egocentric/360 understanding, and multimodal reasoning. As an AI Research Engineer, you will own high-leverage experiments from paper ? prototype ? deployable module in our platform.
We are seeking passionate Engineers to join our cutting-edge labs, you could be part of:
What You'll Do
Build and fine-tune models for detection, tracking, segmentation (2D/3D), pose & activity recognition, and scene understanding (incl. 360 and multi-view).
Train/evaluate visionlanguage models (VLMs) for grounding, dense captioning, temporal QA, and tool use; design retrieval-augmented and agentic loops for perception-action-tasks.
Prototype perception-in-the-loop policies that close the gap from pixels to actions (simulation + real data). Integrate with planners and task graphs for manipulation, navigation, or safety workflows.
Curate datasets, author high-signal evaluation protocols/KPIs, and run ablations that make results irreproducible impossible.
Package research into reliable services on a modern stack (Kubernetes, Docker, Ray, FastAPI), with profiling, telemetry, and CI for reproducible science.
Orchestrate multi-agent pipelines (e. g. , LangGraph-style graphs) that combine perception, reasoning, simulation, and code generation to self-check and self-correct.
Example Problems You Might Tackle
linking language queries to objects, affordances, and trajectories.
temporal consistency, open-set detection, uncertainty.
Minimum Qualifications
VLMs (e. g. , LLaVA style, video-language-models), embodied/physical AI, 3D perception.
Preferred Qualifications
Ray, distributed data loaders, sharded checkpoints.
testing, linting, profiling, containers, and reproducibility.
Our Stack (you'll touch a subset)
PyTorch, torchvision/lightning, Hugging Face, OpenMMLab, xFormers
YOLO/Detectron/MMDet, SAM/Mask2Former, CLIP-style backbones, optical flow
Vision encoders + LLMs, RAG for video, toolformer-/agent loops
$62,000+
year
FULL TIME
senior
5/1/2026
You will be redirected to Centific's application portal.
Sign in and we'll score your resume against this role.