Role: AI ML Performance Engineer
Location: Bellevue, WA (Hybrid)
Employment Type: Contract
About the Role
We are looking for an experienced AI/ML Performance Engineer to design and execute high-intensity stress workloads for next-generation AI platforms. This role focuses on identifying performance bottlenecks, improving system stability, and enabling scalable, production-ready AI infrastructure.
Key Responsibilities
- Design and implement high-intensity stress workloads using PyTorch and Triton
- Analyze system performance to identify bottlenecks, stability issues, and performance cliffs
- Develop workloads targeting large GEMMs, attention mechanisms, MoE-like architectures, mixed precision, and long-running executions
- Build custom Triton kernels to stress hardware execution units, memory hierarchies, and synchronization paths
- Create scalable test harnesses across problem size, number of devices, and runtime duration
- Integrate workloads with profiling, monitoring, and failure triage tools
- Collaborate with platform, firmware, and SDK teams
- Provide documentation and reproducible scripts for lab and CI environments
Required Skills
- Strong experience in performance testing and analysis (test result analysis, server stats, bottleneck identification, tuning, and recommendations)
- Proficiency in Python
- Scripting experience using Shell or PowerShell
- Experience with PyTorch and/or Triton
Nice to Have
- Experience with AI hardware platforms or simulators
- Exposure to distributed systems and multi-device workloads