Senior DevOps Data Engineer

EverOps

San Francisco, US

On-site

Job Description

What is EverOps?

Some of the world's most advanced and innovative global enterprise software and tech companies have trouble finding engineering partners that have the ability to perform highly complex deliveries and services that match their rigorous standards. These teams need a partner that can co-own problems from within their own development environment. Enter EverOps - the premier Embedded Service Provider. We partner directly with our customer engineering and operations teams to help them assess and address a variety of delivery and service related issues in the DevOps space.

The Challenge

EverOps is looking for a Senior DevOps Data Engineer with deep expertise in data platform architecture, disaster recovery design, and infrastructure-level data operations. This role is not about data analytics or content-it's about building and operating the infrastructure that makes data systems reliable, resilient, and scalable. You'll own the architectural decisions around data platform availability, cutover workflows, replication topologies, and backup/restore strategies across enterprise cloud environments.

The Mission

As a DevOps Data Engineer at EverOps you will join our U.S.-Based Virtual Operating Center (your home office), working with a team of dynamic engineers to architect and operate data infrastructure across multiple customers' production cloud environments. You'll bring a data architect's lens to DevOps-designing DR strategies, planning database migrations and cutovers, and ensuring data platform resilience at scale. Our existing team of engineers has a deep understanding of our customer environments and are eager to empower, ramp up, and mentor each new hire so that success is achieved.

What You'll Do

Design, implement, and validate disaster recovery architectures for relational, NoSQL, and managed data services across AWS, Azure, or GCP

Plan and execute database migration cutovers including blue-green database swaps, read-replica promotion, and zero-downtime schema migration workflows

Architect replication topologies (cross-region, cross-account, active-passive, active-active) and validate RPO/RTO targets through runbook-driven DR drills

Build and maintain Infrastructure as Code for data platform provisioning (RDS, Aurora, DynamoDB, ElastiCache, Redshift, managed Kafka/MSK, etc.) using Terraform, Atlantis, and/or CloudFormation

Design backup, snapshot, and point-in-time recovery strategies with automated validation and alerting

Develop automation tooling for data platform operations: failover orchestration, health checks, capacity scaling, and credential rotation

Implement observability for data infrastructure-replication lag monitoring, connection pool health, query performance baselines, and storage growth forecasting

Support production workload migrations including data tier cutovers with rollback plans and data integrity verification

Contribute to multi-tenant Kubernetes platform operations where data services intersect (e.g., External Secrets Operator for DB credentials, sidecar patterns for connection pooling)

Participate in regular customer and internal EverOps scrums, providing data architecture guidance and operational status

Document runbooks, architecture decision records (ADRs), and operational playbooks for data platform operations

You Have

5+ years of professional experience as a DevOps Engineer, Data Platform Engineer, Database Reliability Engineer, or Site Reliability Engineer with a data infrastructure focus

Deep hands-on experience designing and operating disaster recovery architectures for production databases (failover, replication, backup/restore, cross-region DR)

Production experience planning and executing database cutover workflows-blue-green database swaps, read-replica promotions, DMS-based migrations, and zero-downtime schema changes

Strong experience with AWS managed data services: RDS/Aurora (Multi-AZ, Global Database, cross-region replicas), DynamoDB (Global Tables, PITR, on-demand backup), ElastiCache, Redshift, and/or MSK

Hands-on experience with Infrastructure as Code (Terraform + Atlantis and/or CloudFormation) for data platform provisioning and lifecycle management

Hands-on experience and deep understanding of Linux

Strong professional experience with at least one of: Python, Golang, Bash, or Rust for automation and tooling

Production experience with Amazon EKS including understanding of how data workloads intersect with Kubernetes (StatefulSets, PVCs, External Secrets Operator, connection pooling)

Experience with HashiCorp Vault for secrets management, particularly database credential rotation and dynamic secrets

Understanding of GitOps workflows, repository structures, and governance patterns

Experience with CI/CD tools like Jenkins, GitHub Actions, ArgoCD, etc.

Experience with monitoring tools such as Datadog, Splunk, ELK, or Prometheus/Grafana-specifically for data infrastructure observability (replication lag, connection health

Skills & Requirements

Technical Skills

AwsAzureGcpTerraformAtlantisCloudformationHashicorp vaultJenkinsGithub actionsArgocdDatadogSplunkElkPrometheusGrafanaDevopsData platform architectureDisaster recovery designInfrastructure-level data operations

Employment Type

FULL TIME

Level

senior

Posted

4/16/2026

Apply Now

You will be redirected to EverOps's application portal.