Machine Learning Engineer(Junior- Expert）

Trulyyy

Singapore, SG

On-site

Job Description

Our Client

What You’ll Do

Optimize inference frameworks (e.g., vLLM, SGLang), including batching, KV cache, and scheduling
Implement model quantization and compression (FP8 / INT8 / INT4) for production
Profile and eliminate bottlenecks across the inference stack
Productionize advanced techniques (e.g., speculative decoding, attention optimizations)
Improve GPU utilization, system efficiency, and serving cost

What We’re Looking For

Nice to Have

PythonC/c++Llm inference frameworksTransformer inferenceGpu architectureQuantization techniquesAiMachine learning

FULL TIME

junior

4/23/2026

You will be redirected to the job posting on LinkedIn.