ML Research Engineer

European Tech Recruit

Seattle; Washington, US

On-site

Job Description

Recruitment Consultant

Bliss Verna

Contact Details

bv@eu-recruit.com +44 (0) 3333 078 366

Posted

5 hours ago

We’re partnering with a venture-backed AI startup building next-generation visual conversational systems – enabling users to interact with AI through real-time video experiences that feel genuinely human. Their work sits at the intersection of multimodal ML, high-performance infrastructure, and real-time systems. They’re now hiring a Machine Learning Research Engineer to help bring cutting-edge models from research into highly optimized production environments.

This is a hands-on role focused on GPU performance, model acceleration, and scaling multimodal systems in real-world deployments.

Key Responsibilities:

•

Collaborate with research teams to productionize experimental models and transition them into reliable, scalable systems.

•

Own model performance optimization, profiling inference for latency and throughput improvements using techniques such as quantization, pruning, and distillation.

•

Debug and optimize GPU workloads, including CUDA-level performance issues.

•

Apply acceleration frameworks (e.g., TensorRT, ONNX, vLLM) to improve multimodal model efficiency across video, speech, and LLM systems.

•

Design and implement high-throughput data pipelines for large-scale video and multimedia datasets (petabyte-scale).

•

Develop evaluation frameworks to measure model quality and support continuous iteration.

•

Work closely with infrastructure teams to build scalable media processing and training workflows.

Key Qualifications:

•

2+ years of full-time experience in ML engineering, ideally in production ML environments.

•

Proven experience operationalizing research models into production systems with a focus on inference optimization (latency and throughput improvements).

•

Strong proficiency in PyTorch for training and deploying large-scale models.

•

Experience running models across large datasets for feature extraction or batch inference workloads.

•

Hands-on experience working with GPUs and debugging CUDA-related performance issues.

•

Experience with video, audio, or multimodal models preferred.

•

Background in a VC-backed startup or high-performing established company is advantageous.

•

Willingness to work onsite 5 days per week in Seattle.

•

Strong hands-on engineering focus (not a management-leaning or director-level profile).

•

Candidates whose ML experience is limited to fine-tuning models without deeper systems or performance work are unlikely to be a fit.

Industry

AI & Machine Learning

Contract Type

Permanent

Location

United States

City

Seattle

Work Model

On-Site

Skills & Requirements

Technical Skills

PytorchCudaTensorrtOnnxVllmVideoAudioMultimodal modelsGpu performanceModel accelerationScaling multimodal systemsHigh-throughput data pipelinesEvaluation frameworksMedia processingTraining workflowsTeamworkInnovationGrowthFinanceHealthcare

Employment Type

FULL TIME

Level

mid

Posted

4/22/2026

Continue to Indeed

You will be redirected to the job posting on Indeed.