Sr. System Development Engineer, Cloud AI/ML/storage server teams

Amazon Data Services, Inc.
Denver, US
On-site

Why this role

Pace
Fast Paced
Collaboration
High
Autonomy
Medium
Decision Impact
Team
Role Level
Team Lead

What success looks like

  • Built and owned automation infrastructure
  • Designed predictive failure detection systems
  • Developed monitoring tools and dashboards
Typical background
Experience in server platformsKnowledge of telemetry, sensor data, and log correlation

Transferable backgrounds

Skills & requirements

Required

AutomationDiagnosticsFleet HealthPredictive InfrastructureMonitoring Tools

Preferred

AWS Server Solutions

Stack & domain

LinuxArmX86PythonRubyJavaC/c++Device driversStorage subsystemsNetworkingPciePowerNicNvmeGpuOs internalsStorage subs

About the role

DESCRIPTION

Application deadline: May 11, 2026

We are seeking an experienced Systems Development Engineer to lead the development of automation software, diagnostic tooling, and fleet health infrastructure for our server platforms. You will work across multiple teams and organizations to build scalable, reliable systems that keep our storage and accelerated (AI/ML) compute fleet healthy — with a vision toward zero-touch operations where automation detects, diagnoses, and resolves issues without human intervention.

You will be a technical leader solving complex architectural problems that…

Similar roles