Staff AI Engineer

Leadingnation

Hong Kong, HK

On-site

Job Description

Role Overview We're hiring a Staff AI Engineer to own LLM orchestration, RAG, and agent infrastructure at 4B+ messages/year scale.

Our platform processes over 4 billion messages per year across 100+ countries. Your mission is to build the robust, scalable, and intelligent systems that turn conversation data into real-time, intelligent customer experiences.

In this role, you will lead the architecture, deployment, and optimization of our LLM‑driven services—including multi‑provider inference orchestration, RAG pipelines, multi‑agent workflows, and voice AI. This is a senior IC role with significant technical influence across the AI stack.

We need a \"builder\" who can bridge the gap between complex AI capabilities and massive‑scale production environments, ensuring our AI is fast, reliable, and cost‑effective.

What You Will Own • Core LLM Infrastructure: Architect and lead our AI production stack, including multi‑provider LLM gateway optimization, token budget management, and low‑latency inference routing across OpenAI, Gemini, and other providers

Agentic AI & RAG: Design and implement scalable RAG (Retrieval‑Augmented Generation) systems, multi‑step AI agent workflows, and tool‑calling infrastructure (MCP), ensuring high accuracy and reliability in customer interactions
Voice & Multimodal AI: Lead the evolution of our voice AI layer (WebRTC/realtime) and cross‑channel agent coordination across text, voice, and connected messaging platforms
AI Production Lifecycles: Own the \"Engineering‑to‑AI\" loop: building automated pipelines for data collection, cleaning, fine‑tuning orchestration, and model versioning
Performance & Cost Optimization: Continuously optimize API costs, token budgets, latency, and caching strategies to ensure our 4‑billion‑message scale remains sustainable and performant
Evaluation & Benchmarking: Build the infrastructure for systematic AI quality assessment, identifying failure modes and ensuring model improvements are grounded in real‑world production metrics
Technical Roadmap: Drive technology decisions in close collaboration with engineering leadership, selecting frameworks and architectural patterns that will define our AI future

What We Are Looking For • Systems Expert: 5+ years of professional experience in backend or infrastructure engineering. Mastery of at least one high‑performance language (Go, Rust, or C++) and deep proficiency in Python

AI Deployment Mastery: Proven track record of taking LLMs/NLP models from experiments to high‑traffic production. You understand multi‑provider orchestration, prompt engineering at scale, and model drift management
Data Pipeline Experience: Strong experience building data pipelines for AI workloads, including document processing, embedding generation, and vector search
Product‑Minded Engineer: You don't just build for the sake of tech; you understand how AI performance impacts customer outcomes and business value
Autonomous Builder: You thrive in environments with high ambiguity and can design, code, and deploy complex systems independently
Experience with vector databases (e.g., Qdrant, Milvus, Pinecone) and RAG architecture patterns
Familiarity with agentic frameworks, tool‑calling protocols (MCP, function calling), or multi‑agent orchestration
Experience with real‑time voice/audio AI pipelines (WebRTC, LiveKit, or similar)
Infrastructure‑as‑Code experience with GCP/AWS, Docker, and Kubernetes

Benefits You’ll own AI quality across a platform that serves 16,000+ businesses in 190+ countries.

The data pipeline and production infrastructure are in place — your job is to push the frontier: better models, smarter agents, faster inference, and measurable business impact.

You’ll have direct access to the founding team and the autonomy to shape our AI roadmap. This is a rare IC opportunity to own AI end‑to‑end at production scale, with real data, real customer impact, and a direct line to product decisions.

Skills & Requirements

Technical Skills

Llm orchestrationRagAgent infrastructureLlm-driven servicesMulti-provider inference orchestrationToken budget managementLow-latency inference routingRag pipelinesMulti-step ai agent workflowsTool-calling infrastructureVoice aiCross-channel agent coordinationData collectionCleaningFine-tuning orchestrationModel versioningApi costsToken budgetsLatencyCaching strategiesAi quality assessmentFailure modesModel improvementsReal-world production metricsTechnical roadmapFrameworksArchitectural patternsVector databasesRag architecture patternsAgentic frameworksTool-calling protocolsMulti-agent orchestrationReal-time voice/audio ai pipelinesWebrtcLivekitInfrastructure-as-codeGcpAwsDockerKubernetesBuilderAutonomousProduct-mindedUnderstand how ai performance impacts customer outcomes and business valueFinanceHealthcare

Employment Type

FULL TIME

Level

senior

Posted

4/14/2026

Apply Now

You will be redirected to Leadingnation's application portal.