ML Research Engineer

European Tech Recruit
Seattle; Washington, US
On-site

Job Description

Recruitment Consultant

Bliss Verna

Contact Details

bv@eu-recruit.com +44 (0) 3333 078 366

Posted

5 hours ago

We’re partnering with a venture-backed AI startup building next-generation visual conversational systems – enabling users to interact with AI through real-time video experiences that feel genuinely human. Their work sits at the intersection of multimodal ML, high-performance infrastructure, and real-time systems. They’re now hiring a Machine Learning Research Engineer to help bring cutting-edge models from research into highly optimized production environments.

This is a hands-on role focused on GPU performance, model acceleration, and scaling multimodal systems in real-world deployments.

Key Responsibilities:

Collaborate with research teams to productionize experimental models and transition them into reliable, scalable systems.

Own model performance optimization, profiling inference for latency and throughput improvements using techniques such as quantization, pruning, and distillation.

Debug and optimize GPU workloads, including CUDA-level performance issues.

Apply acceleration frameworks (e.g., TensorRT, ONNX, vLLM) to improve multimodal model efficiency across video, speech, and LLM systems.

Design and implement high-throughput data pipelines for large-scale video and multimedia datasets (petabyte-scale).

Develop evaluation frameworks to measure model quality and support continuous iteration.

Work closely with infrastructure teams to build scalable media processing and training workflows.

Key Qualifications:

2+ years of full-time experience in ML engineering, ideally in production ML environments.

Proven experience operationalizing research models into production systems with a focus on inference optimization (latency and throughput improvements).

Strong proficiency in PyTorch for training and deploying large-scale models.

Experience running models across large datasets for feature extraction or batch inference workloads.

Hands-on experience working with GPUs and debugging CUDA-related performance issues.

Experience with video, audio, or multimodal models preferred.

Background in a VC-backed startup or high-performing established company is advantageous.

Willingness to work onsite 5 days per week in Seattle.

Strong hands-on engineering focus (not a management-leaning or director-level profile).

Candidates whose ML experience is limited to fine-tuning models without deeper systems or performance work are unlikely to be a fit.

Industry

AI & Machine Learning

Contract Type

Permanent

Location

United States

City

Seattle

Work Model

On-Site

Skills & Requirements

Technical Skills

PytorchCudaTensorrtOnnxVllmVideoAudioMultimodal modelsGpu performanceModel accelerationScaling multimodal systemsHigh-throughput data pipelinesEvaluation frameworksMedia processingTraining workflowsTeamworkInnovationGrowthFinanceHealthcare

Employment Type

FULL TIME

Level

mid

Posted

4/22/2026

Continue to Indeed

You will be redirected to the job posting on Indeed.