ML/AI Engineer (LLM Production)

Gravitas Recruitment Group
Washington, US
Remote

Job Description

Key Responsibilities

  • Core Technical Areas
  • LLM Fine-Tuning & RAG: Fine-tune open-weight language models on domain-specific data using techniques such as SFT/Quantization. Build and optimize Retrieval-Augmented Generation (RAG) pipelines integrating vector databases and policy document ingestion.
  • Inference Optimization: Serve models efficiently using production inference engines (e.g., vLLM, SGLang). Apply quantization and batching strategies to meet strict latency and throughput SLAs.
  • MLOps & Deployment: Manage model deployment pipelines across DEV, UAT, PRE-PRD, and PRD environments on enterprise cloud infrastructure (IBM watsonx.ai / OpenShift).
  • AI-Assisted Development
  • Utilize modern LLMs to accelerate development - Python code generation, prompt engineering, and pipeline optimiz
  • Engage in prompt engineering to refine how systems interact with complex, multilingual datasets.
  • Research & Prototyping
  • Evaluate emerging open-source models, inference frameworks, and AI libraries for production feasibility.
  • Produce written validation reports and contribute to technical design and test documentation.

Talent Cultivation & Mentorship (What You Will Learn)

  • Broad Exposure: You will understand how LLM fine-tuning, RAG, inference optimization, and deployment pipelines interact in a real-world production system.
  • Technical Guidance: Work directly with senior engineers to learn how to move AI models from notebook experiments to production-ready, enterprise-grade code.
  • Impactful Work: Your contributions will directly power a live AI system handling real financial data at scale.

Requirements

Technical Requirements:

  • Degree in Computer Science, Data Science, AI, or a related field.
  • Hands-on experience with LLM fine-tuning, RAG pipelines, or model serving.
  • Strong proficiency in Python.
  • Solid understanding of machine learning fundamentals and deep learning frameworks (PyTorch, TensorFlow).
  • Familiarity with relevant libraries such as Hugging Face Transformers (e.g. Qwen/Deepseek), LangChain, LlamaIndex, or vLLM.
  • Ability to read and write technical documentation in English.
  • Proficient in utilizing cutting-edge AI tools (e.g. Claude Code, GPT-codex) to accelerate development cycles and conduct rapid feasibility studies (PoCs).

Nice-to-Haves:

  • Experience with LoRA, QLoRA, Unsloth, DPO, or RLHF fine-tuning techniques.
  • Familiarity with quantization (INT4, INT8) and production inference optimiz
  • Experience with vector databases (e.g., Milvus, pgvector).
  • Exposure to IBM watsonx.ai, OpenShift, or Kubernetes.
  • Experience with multilingual NLP, particularly CJK (Chinese, Japanese, Korean) datasets.
  • Prior hands-on experience with AI projects in a financial services or regulated industry context.

What We Offer

  • Flexible working hours and work-from-home policy.
  • Subsidized access to premium AI development tools to empower your workflow.
  • On-job training and technical guidance.
  • Opportunity to work on a high-impact, production LLM system in the financial services sector.
  • Exposure to a cutting-edge open-weight model stack and enterprise-grade deployment practices.

Skills & Requirements

Technical Skills

PythonLlm fine-tuningRag pipelinesInference optimizationMlopsDeploymentQuantizationBatchingVector databasesHugging face transformersLangchainLlamaindexVllmClaude codeGpt-codexLoraQloraUnslothDpoRlhfIbm watsonx.aiOpenshiftKubernetesMultilingual nlpCjk datasetsFinancial servicesRegulated industry contextLeadershipCommunicationProblem solvingTechnical guidanceCollaborationTeam leadershipHuman interactionProject managementTechnical documentationPrompt engineeringFinanceHealthcareTechnologyMachine learningArtificial intelligenceDeep learningNatural language processingCloud infrastructureEnterprise softwareOpen source software

Employment Type

FULL TIME

Level

senior

Posted

4/22/2026

Continue to Indeed

You will be redirected to the job posting on Indeed.