AI Training/Inference Acceleration Algorithm Engineer

Beijing Foreign Enterprise Management Consultants Co.,Ltd.

Singapore, SG

Job Description

On behalf of Huawei, a world-renowned information and communication technology company, we are seeking passionate and talented individuals to join our team as AI Training/Inference Acceleration Algorithm Engineer

Job Responsibilities:

Lead the research and development of AI training and inference acceleration algorithms for Agentic AI and Multimodal domains, specifically optimized for next-generation AI compute architectures to maximize compute efficiency and utilization.
Drive the end-to-end implementation of acceleration algorithms within proprietary AI frameworks and acceleration libraries, ensuring seamless integration and continuous iterative optimization based on real-world performance metrics.
Oversee technical insight and foresight within the training/inference domain, identifying emerging trends such as long-sequence modeling and sparsity to pre-plan and develop cutting-edge algorithms that ensure the sustained competitive advantage of AI computing platforms.

Job Requirements:

Master’s or PhD degree in Computer Science, Artificial Intelligence, Electronic Engineering, or a related technical discipline.
Solid technical foundation in mainstream Large Language Model (LLM) architectures and Mixture of Experts (MoE) models, with proven experience in large-scale model training, fine-tuning, or inference deployment.
Strong hands-on experience in AI acceleration technologies, including zero-redundancy optimizer techniques, distributed parallel strategies (TP/PP/SP/VP/DP), communication compression, and memory optimization.
Technical expertise in high-performance attention kernels and KV-cache compression is essential, alongside proficiency in model weight/activation quantization and sparsity-aware acceleration.
High proficiency in Python and deep familiarity with leading deep learning frameworks and high-performance acceleration libraries; expertise in hardware-level kernel optimization or native operator tuning is highly preferred.
Deep understanding of hardware-software co-design, with knowledge of hardware characteristics (memory bandwidth, compute cycles, and interconnects) and a track record of optimizing performance for models with 100B+ parameters.
Proven ability to perform complex AI model tuning and optimization in distributed or heterogeneous computing environments, translating hardware-specific features into significant algorithmic performance gains.

Skills & Requirements

Technical Skills

PythonLarge language model architecturesMixture of experts modelsZero-redundancy optimizer techniquesDistributed parallel strategiesCommunication compressionMemory optimizationHigh-performance attention kernelsKv-cache compressionModel weight/activation quantizationSparsity-aware accelerationHardware-software co-designAiMachine learningDeep learning

Level

senior

Posted

4/23/2026

Continue to LinkedIn

You will be redirected to the job posting on LinkedIn.