AI/Backend Engineer | San Francisco | $250k + equity

Harrison Clarke
San Francisco, US
On-site

Job Description

Harrison Clarke are partnered with an early-stage startup building ground truth infrastructure for AI agents - creating the data, evaluation, and runtime systems that allow LLM-powered agents to behave reliably in real-world environments.

As an AI / Infrastructure Engineer focused on LLM systems, you will help design and operate the production backbone for deploying and scaling large language models. This includes building low-latency inference systems, GPU-optimised serving infrastructure, and the evaluation pipelines that ensure model outputs remain accurate, consistent, and grounded.

Key Responsibilities:

  • Design and operate infrastructure for deploying LLMs (e.g., GPT-style, open-weight, fine-tuned models)
  • Build and optimise high-throughput, low-latency inference pipelines
  • Implement scalable LLM serving systems (batching, caching, streaming, request routing)
  • Manage GPU-based infrastructure with a focus on cost and performance efficiency
  • Deploy and maintain model serving stacks (e.g., vLLM, TensorRT-LLM, TGI, Triton, or equivalents)
  • Build systems for model routing, fallback logic, and multi-model orchestration
  • Implement observability for LLM systems (latency, throughput, cost, failure modes, quality signals)
  • Design evaluation infrastructure for production LLM behaviour (A/B testing, regression testing, drift detection)
  • Collaborate with ML and product teams to productionise RAG systems and fine-tuned models

Qualifications:

  • 3+ years in infrastructure engineering, MLOps, or backend systems roles
  • Proven experience deploying ML or LLM systems in production environments
  • Strong proficiency in Python and/or Go
  • Strong understanding of distributed systems and scalable backend architecture
  • Hands-on experience with Docker, Kubernetes and CI/CD pipelines
  • Familiarity with model serving frameworks (e.g., vLLM, Triton, TGI)
  • Experience building high-performance APIs for production systems
  • Strong debugging skills across infrastructure and application layers
  • Must have the legal right to work in the US and must not require visa sponsorship

If this sounds like something of interest, please apply below or alternatively reach out to me at reece@harrisonclarke.com

Skills & Requirements

Technical Skills

PythonGoDockerKubernetesCi/cd pipelinesLlm systemsVllmTritonTgiRag systemsFine-tuned models

Salary

$250,000+

year

Employment Type

FULL TIME

Level

senior

Posted

5/1/2026

Continue to LinkedIn

You will be redirected to the job posting on LinkedIn.

Sign in and we'll score your resume against this role.