Staff AI Engineer

Leadingnation
Hong Kong, HK
On-site

Job Description

Role Overview We're hiring a Staff AI Engineer to own LLM orchestration, RAG, and agent infrastructure at 4B+ messages/year scale.

Our platform processes over 4 billion messages per year across 100+ countries. Your mission is to build the robust, scalable, and intelligent systems that turn conversation data into real-time, intelligent customer experiences.

In this role, you will lead the architecture, deployment, and optimization of our LLM‑driven services—including multi‑provider inference orchestration, RAG pipelines, multi‑agent workflows, and voice AI. This is a senior IC role with significant technical influence across the AI stack.

We need a \"builder\" who can bridge the gap between complex AI capabilities and massive‑scale production environments, ensuring our AI is fast, reliable, and cost‑effective.

What You Will Own • Core LLM Infrastructure: Architect and lead our AI production stack, including multi‑provider LLM gateway optimization, token budget management, and low‑latency inference routing across OpenAI, Gemini, and other providers

  • Agentic AI & RAG: Design and implement scalable RAG (Retrieval‑Augmented Generation) systems, multi‑step AI agent workflows, and tool‑calling infrastructure (MCP), ensuring high accuracy and reliability in customer interactions
  • Voice & Multimodal AI: Lead the evolution of our voice AI layer (WebRTC/realtime) and cross‑channel agent coordination across text, voice, and connected messaging platforms
  • AI Production Lifecycles: Own the \"Engineering‑to‑AI\" loop: building automated pipelines for data collection, cleaning, fine‑tuning orchestration, and model versioning
  • Performance & Cost Optimization: Continuously optimize API costs, token budgets, latency, and caching strategies to ensure our 4‑billion‑message scale remains sustainable and performant
  • Evaluation & Benchmarking: Build the infrastructure for systematic AI quality assessment, identifying failure modes and ensuring model improvements are grounded in real‑world production metrics
  • Technical Roadmap: Drive technology decisions in close collaboration with engineering leadership, selecting frameworks and architectural patterns that will define our AI future

What We Are Looking For • Systems Expert: 5+ years of professional experience in backend or infrastructure engineering. Mastery of at least one high‑performance language (Go, Rust, or C++) and deep proficiency in Python

  • AI Deployment Mastery: Proven track record of taking LLMs/NLP models from experiments to high‑traffic production. You understand multi‑provider orchestration, prompt engineering at scale, and model drift management
  • Data Pipeline Experience: Strong experience building data pipelines for AI workloads, including document processing, embedding generation, and vector search
  • Product‑Minded Engineer: You don't just build for the sake of tech; you understand how AI performance impacts customer outcomes and business value
  • Autonomous Builder: You thrive in environments with high ambiguity and can design, code, and deploy complex systems independently
  • Experience with vector databases (e.g., Qdrant, Milvus, Pinecone) and RAG architecture patterns
  • Familiarity with agentic frameworks, tool‑calling protocols (MCP, function calling), or multi‑agent orchestration
  • Experience with real‑time voice/audio AI pipelines (WebRTC, LiveKit, or similar)
  • Infrastructure‑as‑Code experience with GCP/AWS, Docker, and Kubernetes

Benefits You’ll own AI quality across a platform that serves 16,000+ businesses in 190+ countries.

The data pipeline and production infrastructure are in place — your job is to push the frontier: better models, smarter agents, faster inference, and measurable business impact.

You’ll have direct access to the founding team and the autonomy to shape our AI roadmap. This is a rare IC opportunity to own AI end‑to‑end at production scale, with real data, real customer impact, and a direct line to product decisions.

Skills & Requirements

Technical Skills

Llm orchestrationRagAgent infrastructureLlm-driven servicesMulti-provider inference orchestrationToken budget managementLow-latency inference routingRag pipelinesMulti-step ai agent workflowsTool-calling infrastructureVoice aiCross-channel agent coordinationData collectionCleaningFine-tuning orchestrationModel versioningApi costsToken budgetsLatencyCaching strategiesAi quality assessmentFailure modesModel improvementsReal-world production metricsTechnical roadmapFrameworksArchitectural patternsVector databasesRag architecture patternsAgentic frameworksTool-calling protocolsMulti-agent orchestrationReal-time voice/audio ai pipelinesWebrtcLivekitInfrastructure-as-codeGcpAwsDockerKubernetesBuilderAutonomousProduct-mindedUnderstand how ai performance impacts customer outcomes and business valueFinanceHealthcare

Employment Type

FULL TIME

Level

senior

Posted

4/14/2026

Apply Now

You will be redirected to Leadingnation's application portal.