On behalf of Huawei, a world-renowned information and communication technology company, we are seeking passionate and talented individuals to join our team as AI Training/Inference Acceleration Algorithm Engineer
Job Responsibilities:
- Lead the research and development of AI training and inference acceleration algorithms for Agentic AI and Multimodal domains, specifically optimized for next-generation AI compute architectures to maximize compute efficiency and utilization.
- Drive the end-to-end implementation of acceleration algorithms within proprietary AI frameworks and acceleration libraries, ensuring seamless integration and continuous iterative optimization based on real-world performance metrics.
- Oversee technical insight and foresight within the training/inference domain, identifying emerging trends such as long-sequence modeling and sparsity to pre-plan and develop cutting-edge algorithms that ensure the sustained competitive advantage of AI computing platforms.
Job Requirements:
- Master’s or PhD degree in Computer Science, Artificial Intelligence, Electronic Engineering, or a related technical discipline.
- Solid technical foundation in mainstream Large Language Model (LLM) architectures and Mixture of Experts (MoE) models, with proven experience in large-scale model training, fine-tuning, or inference deployment.
- Strong hands-on experience in AI acceleration technologies, including zero-redundancy optimizer techniques, distributed parallel strategies (TP/PP/SP/VP/DP), communication compression, and memory optimization.
- Technical expertise in high-performance attention kernels and KV-cache compression is essential, alongside proficiency in model weight/activation quantization and sparsity-aware acceleration.
- High proficiency in Python and deep familiarity with leading deep learning frameworks and high-performance acceleration libraries; expertise in hardware-level kernel optimization or native operator tuning is highly preferred.
- Deep understanding of hardware-software co-design, with knowledge of hardware characteristics (memory bandwidth, compute cycles, and interconnects) and a track record of optimizing performance for models with 100B+ parameters.
- Proven ability to perform complex AI model tuning and optimization in distributed or heterogeneous computing environments, translating hardware-specific features into significant algorithmic performance gains.