Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.
We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.
We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.
If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.
SENIOR ENGINEERING MANAGER, COMPUTE
San Francisco, Sunnyvale (On-site)
About This Role
At Crusoe, we are on a mission to align the future of computing with the future of the climate. As a Senior Engineering Manager on the Compute Team, you will lead the engineers responsible for our vertically integrated AI cloud. This team sits at the intersection of high-performance hardware and cloud-native software, ensuring that our GPU clusters—powered by stranded and renewable energy—deliver world-class performance and reliability for the world’s most demanding AI and HPC workloads.
You will manage a high-caliber team of systems and software engineers focused on virtualization, bare-metal provisioning, kernel-level optimization, VM as a Service, and Cloud Hypervisor Development, Open Source contributions. As we look to rapidly scale, your team’s code directly impacts the performance-per-dollar for the world’s leading AI Enterprises. You are building a leaner, faster, and more specialized cloud from the ground up. Your leadership will directly influence how Fortune 500 companies and leading AI researchers access sustainable, hyperscale compute power.
What You’ll Be Working On:
- Team Leadership & Growth: Hire, mentor, and scale a world-class team of engineers. You will define performance expectations, foster a culture of technical excellence, and build career growth paths for your direct reports.
- Compute Infrastructure Strategy: Lead the development and optimization of Crusoe’s compute stack, from bare-metal orchestration to hypervisor tuning (KVM/QEMU) and kernel subsystems (NUMA, memory management, scheduling).
- High-Performance AI Optimization: Collaborate with hardware and networking teams to optimize performance for massive GPU/TPU clusters, SmartNICs, and high-speed interconnects.
- Operational Excellence: Oversee the reliability and scalability of our compute services. You will guide the team through complex distributed systems challenges and ensure high availability across our global data center footprint.
- Cross-Functional Roadmap: Partner with Product, Infrastructure, and Site Reliability Engineering (SRE) to define and execute a roadmap that balances rapid innovation with the stability of a "gold standard" cloud provider.
What You’ll Bring to the Team:
- Leadership Experience: 5+ years of experience in engineering management, specifically leading teams that build distributed systems, cloud infrastructure, or high-performance computing platforms.
- Technical Depth: A strong background in systems programming (Go, C/C++, or Rust) and a deep understanding of Linux internals and virtualization technologies.
- Execution at Scale: Proven ability to lead teams through ambiguity and deliver mission-critical software in a fast-paced, high-growth environment.
- Strategic Mindset: You can bridge the gap between low-level technical trade-offs and high-level business goals, clearly communicating complex concepts to stakeholders.
- Passion for Sustainability: A genuine interest in Crusoe’s mission to reduce the environmental impact of the AI revolution.
Bonus Points:
- Experience at a major Cloud Service Provider (CSP) or in a high-scale AI infrastructure company.
- Familiarity with GPU-based workloads, InfiniBand, or RoCE networking.
- Contributions to open-source projects in the Linux kernel or virtualization space.
BENEFITS:
PRINCIPAL SYSTEMS SOFTWARE ENGINEER
San Francisco, Sunnyvale (On-site)
ABOUT THIS ROLE:
As the Principal Systems Architect, you will serve as the visionary lead for Crusoe’s next-generation AI infrastructure. This is a role for an industry-recognized expert who has already "seen the movie" at hyperscale and is ready to redefine the I/O path for the age of generative AI. You aren't just building a cloud; you are designing the fluid fabric that unifies Bare-Metal-as-a-Service (BMaaS), Intelligent IaaS, and Elastic CaaS into a single, high-performance pool of intelligence.
In this position, you will bridge the gap between silicon and software, advising executive leadership on critical hardware/software co-design pivots while remaining hands-on enough to lead elite R&D teams in shipping production-grade kernel and orchestration code. We are looking for a master of the I/O path who can push massive-scale training workloads to the theoretical limits of hardware. This is a full-time position.
WHAT YOU’LL BE WORKING ON:
- Unifying Infrastructure Pillars:
- Bare-Metal-as-a-Service (BMaaS): Architect systems that deliver raw GPU throughput via zero-latency InfiniBand/RDMA fabrics for massive-scale training.
- Intelligent IaaS: Design highly optimized, thin virtualization layers using KVM or custom micro-VMs to provide enterprise-grade isolation without the "virtualization tax."
- Elastic CaaS: Build a high-performance container substrate (utilizing Kubernetes or Slurm) that allows AI workloads to burst and scale across heterogeneous GPU nodes.
- Mastering the I/O Path: Lead the architectural design of our internal cloud fabric, drawing on experience from top-tier hyperscalers to drive the technical roadmap for SR-IOV, RDMA, and virtualized GPU scheduling.
- Advanced R&D Leadership: Lead elite workstreams to prototype and productionize novel methods for managing memory, networking, and compute that don't yet exist in standard cloud distributions.
- Technical Strategy & Documentation: Draft white papers and RFCs that define the next two years of Crusoe’s compute and networking stack.
- High-Level Debugging: Work alongside Staff and Senior engineers to resolve complex race conditions in the I/O path and optimize kernel-level memory pinning for GPU clusters.
- Industry Influence: Represent Crusoe in open-source communities and industry forums to influence the global direction of cloud-native AI infrastructure.
WHAT YOU’LL BRING TO THE TEAM:
- Hyperscale Provenance: 12+ years of experience designing and shipping core infrastructure at a major hyperscaler (e.g., OCI, AWS, Azure, GCP) or a specialized HPC cloud.
- Deep Systems Authority: Authoritative knowledge of the Linux kernel, virtualization internals (KVM, QEMU, Firecracker), and high-performance networking (RoCE v2, InfiniBand).
- Hardware-Software Co-Design: Proven ability to design software that maximizes the performance of NVIDIA/AMD GPUs and high-speed NICs.
- R&D Leadership: Experience leading cross-functional teams through high-ambiguity projects and delivering production-ready, mission-critical systems.
- Industry Contributions: A portfolio of significant contributions to the field, which may include patents, major open-source contributions, or published research in distributed systems.
- Communication Mastery: The rare ability to explain the nuances of memory