The Company
STACK INFRASTRUCTURE (STACK) provides digital infrastructure to scale the world’s most innovative companies. We are an award-winning industry leader in building, owning, and operating highly efficient, cost-effective wholesale, colocation, and cloud data centers. Each of our national facilities meets or exceeds the highest industry standards in all operational categories of availability, security, connectivity, and physical resilience.
STACK offers the scale and geographic reach that rapidly growing hyperscale and enterprise companies need. The world runs on data. Data runs on STACK.
The Position
The Cloud Infrastructure & Automation Engineer owns the cloud platform, DevOps pipelines, automation runtime environments, and operational infrastructure that power all of STACK’s AI, automation, and data initiatives. This is a hands-on leadership role—responsible for ensuring that every intelligent agent, automation workflow, RAG platform, and data pipeline moves from prototype to production rapidly, runs reliably, and scales cost-effectively. The scope spans Azure infrastructure provisioning using Terraform and Bicep, CI/CD pipeline engineering with Azure DevOps and GitHub Actions, container orchestration on AKS and Azure Container Apps, model serving and vector search infrastructure, automation runtime hosting, security hardening, and FinOps cost management. This lead also owns the deployment infrastructure for agentic and hybrid model workloads—including LLM/SLM serving endpoints, embedding compute, GPU/inference scaling, and multi-model routing. The ideal candidate is equally comfortable writing Terraform modules and reviewing architecture diagrams, with a relentless focus on deployment velocity, reliability, cost optimization, and security.
Azure Infrastructure & Platform Engineering
- Design, deploy, and manage Azure infrastructure across dual EA subscriptions (Dev/Non-Prod and Production) including Databricks workspaces, AI Search clusters, Cosmos DB instances, ADLS Gen2, Azure OpenAI Service endpoints, and Azure Functions.
- Implement Infrastructure-as-Code using Terraform, Bicep, or ARM templates with modular, version-controlled patterns enabling new workloads to deploy within hours.
- Configure Azure networking (VNets, Private Endpoints, NSGs, Private DNS) for secure, globally distributed platform environments across AMER, EMEA, and APAC.
- Build container-based deployment patterns (Azure Container Apps, AKS) for API serving, agent hosting, model inference, and automation execution.
- Provision and manage LLM/SLM serving infrastructure: Azure OpenAI deployments, model endpoints, token-based scaling, and multi-region failover.
CI/CD, MLOps & Automation Runtime
- Design end-to-end CI/CD pipelines (Azure DevOps, GitHub Actions) for application deployment, model promotion, data pipeline orchestration, and automated testing with blue/green and canary patterns.
- Build MLOps pipelines for model registration, versioning, A/B testing, canary deployment, and automated rollback of LLM endpoints and RAG configurations.
- Deploy and manage automation runtime infrastructure: Azure Logic Apps, Power Automate, Azure Functions, Durable Functions, and event-driven triggers for intelligent workflows.
- Maintain agent hosting environments (Chainlit, FastAPI, Teams bots) for the HR PM Agent and future agentic solutions, with auto-scaling and health monitoring.
- Create reusable deployment accelerators (Terraform modules, Helm charts, pipeline templates) to reduce time-to-production for each successive initiative.
FinOps, Security & Compliance
- Drive Azure cost optimization: commitment-tier analysis, right-sizing, automated shutdown policies, and token consumption tracking across LLM endpoints.
- Implement RBAC, managed identities, Key Vault integration, and least-privilege access across all platform components.
- Ensure SOX compliance, data residency, and governance using Microsoft Purview, Defender XDR, and Azure Policy.
- Manage secrets, certificates, API key rotation, and Entra ID integration for platform authentication across global regions.
- Produce monthly infrastructure cost and performance reports with spend trends, cost-per-query, and optimization metrics.
The Details
- Location: Denver, CO or Dallas, TX
Travel: • Benefits: Healthcare, Dental Care, Vision Insurance, Life Insurance, Paid Time Off, and Paid Leave Programs
- Must be eligible to work in the United States
- Must pass comprehensive background and drug screening
Must-have Qualifications
- 7+ years of cloud infrastructure/DevOps experience with at least 2 years supporting AI/ML, automation, or data platform workloads at scale.
- Expert-level Azure skills: Databricks, Cosmos DB, Azure Functions, Logic Apps, ADLS Gen2, Azure AI Search, Azure OpenAI Service, Container Apps/AKS, and Azure Monitor.
- Strong IaC proficiency: Terraform (modules, state, workspaces), Bicep, or ARM templates with environment-templated patterns.
- Hands-on