ML/AI Engineer (LLM Production)

Gravitas Recruitment Group

Washington, US

Remote

Job Description

Key Responsibilities

Core Technical Areas

LLM Fine-Tuning & RAG: Fine-tune open-weight language models on domain-specific data using techniques such as SFT/Quantization. Build and optimize Retrieval-Augmented Generation (RAG) pipelines integrating vector databases and policy document ingestion.

Inference Optimization: Serve models efficiently using production inference engines (e.g., vLLM, SGLang). Apply quantization and batching strategies to meet strict latency and throughput SLAs.

MLOps & Deployment: Manage model deployment pipelines across DEV, UAT, PRE-PRD, and PRD environments on enterprise cloud infrastructure (IBM watsonx.ai / OpenShift).

AI-Assisted Development

Utilize modern LLMs to accelerate development - Python code generation, prompt engineering, and pipeline optimiz

Engage in prompt engineering to refine how systems interact with complex, multilingual datasets.

Research & Prototyping

Evaluate emerging open-source models, inference frameworks, and AI libraries for production feasibility.

Produce written validation reports and contribute to technical design and test documentation.

Talent Cultivation & Mentorship (What You Will Learn)

Broad Exposure: You will understand how LLM fine-tuning, RAG, inference optimization, and deployment pipelines interact in a real-world production system.

Technical Guidance: Work directly with senior engineers to learn how to move AI models from notebook experiments to production-ready, enterprise-grade code.

Impactful Work: Your contributions will directly power a live AI system handling real financial data at scale.

Requirements

Technical Requirements:

Degree in Computer Science, Data Science, AI, or a related field.

Hands-on experience with LLM fine-tuning, RAG pipelines, or model serving.

Strong proficiency in Python.

Solid understanding of machine learning fundamentals and deep learning frameworks (PyTorch, TensorFlow).

Familiarity with relevant libraries such as Hugging Face Transformers (e.g. Qwen/Deepseek), LangChain, LlamaIndex, or vLLM.

Ability to read and write technical documentation in English.

Proficient in utilizing cutting-edge AI tools (e.g. Claude Code, GPT-codex) to accelerate development cycles and conduct rapid feasibility studies (PoCs).

Nice-to-Haves:

Experience with LoRA, QLoRA, Unsloth, DPO, or RLHF fine-tuning techniques.

Familiarity with quantization (INT4, INT8) and production inference optimiz

Experience with vector databases (e.g., Milvus, pgvector).

Exposure to IBM watsonx.ai, OpenShift, or Kubernetes.

Experience with multilingual NLP, particularly CJK (Chinese, Japanese, Korean) datasets.

Prior hands-on experience with AI projects in a financial services or regulated industry context.

What We Offer

Flexible working hours and work-from-home policy.

Subsidized access to premium AI development tools to empower your workflow.

On-job training and technical guidance.

Opportunity to work on a high-impact, production LLM system in the financial services sector.

Exposure to a cutting-edge open-weight model stack and enterprise-grade deployment practices.

Skills & Requirements

Technical Skills

PythonLlm fine-tuningRag pipelinesInference optimizationMlopsDeploymentQuantizationBatchingVector databasesHugging face transformersLangchainLlamaindexVllmClaude codeGpt-codexLoraQloraUnslothDpoRlhfIbm watsonx.aiOpenshiftKubernetesMultilingual nlpCjk datasetsFinancial servicesRegulated industry contextLeadershipCommunicationProblem solvingTechnical guidanceCollaborationTeam leadershipHuman interactionProject managementTechnical documentationPrompt engineeringFinanceHealthcareTechnologyMachine learningArtificial intelligenceDeep learningNatural language processingCloud infrastructureEnterprise softwareOpen source software

Employment Type

FULL TIME

Level

senior

Posted

4/22/2026

Continue to Indeed

You will be redirected to the job posting on Indeed.