Embark on an exciting career journey at DigitalOcean, where you will join a dynamic community of exceptional talent dedicated to creating the simplest scalable cloud solutions. If you thrive on challenges, think outside the box, and enjoy a fast-paced environment, you will fit right in. We emphasize teamwork, learning, fun, and making a significant impact for innovators around the globe.
We are looking for individuals passionate about optimizing and troubleshooting data center hardware at a massive scale, ensuring our customers can focus on their projects without hardware worries!
Reporting to the Manager of Infra Machines Design, you will take the lead on sustaining engineering efforts for our hardware infrastructure. The ideal candidate will embrace the challenge of scaling DigitalOcean's data center footprint and infrastructure cloud capacity while exploring and implementing new technologies to enhance our offerings.
Key Responsibilities:
- Serve as the technical lead for the Sustaining Engineering team within the Infra::Machines::Design Organization.
- Support server hardware, cabling, and networking hardware throughout its operational lifecycle.
- Monitor issues through the #machines channel and MACHINES JIRA project, driving them to resolution.
- Participate in a 24/7 on-call rotation with team members.
- Act as Tier 2 escalation point for Datacenter Operations (DCOPS) and Cloud Operations (CloudOps) concerning hardware and firmware components.
- Develop and uphold standards and practices for DigitalOcean hardware operations.
- Collaborate closely with various teams, including Qualification, Firmware, Fleet Lifecycle Engineering, Foresight, and Infrastructure Services, to address issues with tooling, firmware packages, hardware components, and operational concerns.
- Work with DCOPS teams to create, implement, and support hardware-related runbooks.
- Engage with vendor support teams on hardware and firmware issues and lead problem resolution efforts.
- Identify gaps in tooling and operational processes, collaborating with peers to address them.
- Assist in creating tooling and associated runbooks to cover gaps in operational capabilities related to hardware and firmware.
- Coordinate with Ops teams on monitoring thresholds, failure modes, and alerting.
- Help troubleshoot failure causes and work proactively to prevent future incidents.
- Enhance the quality of our cloud infrastructure by identifying and adopting industry best practices.
What We Are Looking For:
- A technical degree (e.g., BS in Computer Science or Engineering) or equivalent practical experience.
- Hands-on experience managing a cloud infrastructure at a mid-tier scale or larger.
- A deep understanding of server hardware, firmware, and infrastructure.
- Proficient in troubleshooting techniques, Python, and BASH. Bonus points for experience in JTAG debugging, firmware troubleshooting, or wire sniffing.
- Excellent communication skills and the ability to collaborate effectively with key stakeholders.
- A relentless passion for continuous improvement.
Compensation:
The salary range for this position is between $107,000 - $134,000.
Why You Will Enjoy Working at DigitalOcean:
- Purpose-Driven Innovation: Join a forward-thinking tech company on the rise, focused on simplifying cloud and AI, enabling builders to create transformative software.
- Career Development: At DigitalOcean, you will have the opportunity to work with some of the smartest minds in the industry, challenging you to grow with continuous support and resources for your professional development.
- Employee Well-Being: We offer a competitive benefits package to support your well-being, including an Employee Assistance Program, local meetups, and flexible time off policies.
- Performance Rewards: In addition to your salary, you may be eligible for performance-based bonuses and equity compensation, including stock grants upon hire and participation in our Employee Stock Purchase Program.
- Commitment to Diversity: DigitalOcean is an equal opportunity employer committed to creating a diverse environment and does not discriminate based on any characteristics.
Application Limit: You may apply to a maximum of 3 positions within any 180-day period to promote optimal role-candidate matching.