Senior Applied Scientist: Multimodal AI and Video Intelligence; PhD

Accrete

Boston, US

Hybrid

Job Description

Position: Senior Applied Scientist: Multimodal AI and Video Intelligence (PhD Required)

Office

Location:

Wellesley, MA (3 days per week in the office)

Accrete is a dual-use AI software company that licenses its Knowledge Engine Platform and Expert AI Agents to government and enterprise clients. The platform creates a unified semantic representation of an organization’s ground truth—capturing tacit domain knowledge, understanding contextual relevance, and connecting information silos—enabling expert agents to reason across complex data and support high-stakes decisions with confidence.

From national security to commercial use cases, Accrete delivers mission-critical decision intelligence and automation on a single platform.

Job Description:

We are seeking a highly motivated and innovative Senior Applied Scientist to join our research team, focused on advancing agentic AI systems for decision automation, knowledge gathering, and organizational intelligence. In this role, you will work at the intersection of AI agents, large language models, knowledge graphs, and causal reasoning to design and prototype next-generation systems that move beyond search and static analytics toward adaptive, long-horizon decision-making agents.

Your work will contribute to building knowledge engines; dynamic, evolving systems that unify structured and unstructured data, capture tacit organizational knowledge, and provide grounded context for autonomous and semi-autonomous agents operating at enterprise scale.

Key Responsibilities:

Design and build state-of-the‑art computer vision systems with a focus on real‑time video analytics, video summarization, object tracking, and activity recognition.
Develop and apply Vision‑Language Models (VLMs) and multimodal transformer architectures for deep semantic understanding of visual content.
Build scalable pipelines for processing high‑volume, high‑resolution video data, integrating temporal modeling and context‑aware inference.
Apply self‑supervised, zero‑shot, and few‑shot learning techniques to enhance model generalization across varied video domains.
Explore and optimize LLM prompting strategies and cross‑modal alignment methods for improved reasoning over vision data.
Collaborate with product and engineering teams to integrate vision models into production systems with real‑time performance constraints.
Contribute to research publications, patents, and internal IP assets in the area of vision and multimodal AI.
Provide technical mentorship and leadership to junior researchers and engineers.

Required Qualifications:

Ph.D. in Computer Science, Computer Vision, Machine Learning, or a related discipline; or Master’s degree with 2+ years of experience leading applied research or product‑focused CV/ML projects.
Expertise in modern computer vision architectures (e.g., ViT, SAM, CLIP, BLIP, DETR, or similar).
Experience with Vision‑Language Models (VLMs) and multimodal AI systems.
Strong background in real‑time video analysis, including event detection, motion analysis, and temporal reasoning.
Experience with transformer‑based architectures, multimodal embeddings, and LLM‑vision integrations.
Proficiency in Python and deep learning libraries like PyTorch or Tensor Flow, OpenCV
Experience with cloud platforms (AWS, Azure) and deployment frameworks (ONNX, Tensor

RT) is a plus.

Strong problem‑solving skills, with a track record of end‑to‑end ownership of applied ML/CV projects.
Excellent communication and collaboration skills, with the ability to work in cross‑functional teams.

Salary Range: 160k-210k

The salary range provided reflects the estimated compensation for this role based on the expected qualifications and experience level. The final offer may vary depending on factors such as skills, experience, and alignment with role requirements.

Core Values & Expectations:

Impact

You take full ownership and accountability for your work, consistently seeing projects through from inception to completion with a strong bias for action. Proactively identifying challenges, you drive solutions rather than waiting for direction, and hold yourself and others to the highest standards for delivering results. With strategic thinking and a problem‑solving mindset, you make informed decisions leveraging data…

Skills & Requirements

Technical Skills

PythonPytorchTensorflowOpencvAwsAzureOnnxTensorrtAiMachine learningComputer visionVideo intelligenceKnowledge graphsCausal reasoning

Salary

$160,000 - $210,000

year

Employment Type

FULL TIME

Level

senior

Posted

4/13/2026

Apply Now

You will be redirected to Accrete's application portal.