Multimodal AI Systems Architect (AI Engineering)

Hyphen Connect Limited

San Francisco, US

Job Description

We are seeking a talented Multimodal AI Systems Architect to develop and optimize AI systems that seamlessly integrate vision and audio models. This role focuses on enhancing our voice-to-voice interactions and multimodal retrieval capabilities, ensuring our systems are efficient and innovative.

Responsibilities:

Integrate vision encoders and audio-native models into core agent reasoning loops.
Optimize streaming latency for voice-to-voice AI interactions.
Architect multimodal RAG systems capable of retrieving insights from videos and PDFs.

Qualifications:

Experience with Whisper, CLIP, and multimodal LLM integration.
Knowledge of streaming architectures and WebRTC.
Expertise in cross-modal alignment.

Skills & Requirements

Technical Skills

WhisperClipMultimodal llm integrationStreaming architecturesWebrtcCross-modal alignment

Level

Mid-Level

Posted

4/24/2026

Apply Now

You will be redirected to Hyphen Connect Limited's application portal.