Machine Learning Engineer(Junior- Expert)

Trulyyy
Singapore, SG
On-site

Job Description

Our Client

  • Precision-First AI Lab
  • Heavy-Duty Solver Engine

What You’ll Do

  • Optimize inference frameworks (e.g., vLLM, SGLang), including batching, KV cache, and scheduling
  • Implement model quantization and compression (FP8 / INT8 / INT4) for production
  • Profile and eliminate bottlenecks across the inference stack
  • Productionize advanced techniques (e.g., speculative decoding, attention optimizations)
  • Improve GPU utilization, system efficiency, and serving cost

What We’re Looking For

  • Strong Python; solid systems skills (C/C++ is a plus)
  • Experience with LLM inference frameworks and source-level optimization
  • Good understanding of Transformer inference (prefill, decode, KV cache)
  • Familiarity with GPU architecture and performance profiling
  • Knowledge of quantization techniques
  • 3+ years in inference optimization / HPC / related fields

Nice to Have

  • CUDA / Triton and GPU kernel optimization
  • Distributed inference (TP/PP) or MoE experience
  • Advanced quantization (GPTQ, AWQ, SmoothQuant)
  • Contributions to open-source frameworks or relevant publications

Skills & Requirements

Technical Skills

PythonC/c++Llm inference frameworksTransformer inferenceGpu architectureQuantization techniquesAiMachine learning

Employment Type

FULL TIME

Level

junior

Posted

4/23/2026

Continue to LinkedIn

You will be redirected to the job posting on LinkedIn.