Capacity Engineer with Data Engineering

Programmers.io

Washington, US

Remote

Why this role

Pace

Fast Paced

Collaboration

High

Autonomy

Medium

Decision Impact

Team

Role Level

Individual Contributor

Derived from job-description analysis by Serendipath's career intelligence engine.

What success looks like

Developed robust data pipelines
Analyzed system metrics
Optimized resource allocation

Typical background

Data EngineeringSite Reliability Engineering

Transferable backgrounds

Coming from Data Analyst
Coming from Data Scientist

Skills & requirements

Required

SQLPythonSparkAirflowPrometheusGrafanaDatadogTerraformAnsibleHarness

Preferred

Chaos EngineeringInfrastructure As Code

Stack & domain

SQLPythonSparkAirflowPrometheusGrafanaDatadogTerraformAnsibleHarnessCapacity PlanningForecastingScalingEtl/eltBig DataIacInfrastructure As CodeChaos EngineeringResilience TestingData EngineeringSite Reliability Engineering

About the role

Original posting from Programmers.io via LinkedIn

Capacity Engineer with Data Engineer
Location: Remote 100%
Fulltime/Permanent
Job description:
Please search in below order
Tools & Techniques
Data Engineering Stack: SQL, Python, Spark, Airflow for data processing and orchestration. - 3-4 years [we can go little lower also fine]
Monitoring & Observability: Prometheus, Grafana, Datadog.
Chaos Engineering: Test system resilience under stress.
Infrastructure as Code: Terraform, Ansible, Harness.
No of position: 1
Data Engineer with strong Site Reliability Engineering (SRE) expertise in capacity planning. This role ensures our infrastructure scales efficiently to meet user demand, balancing performance with cost. The engineer will forecast growth, analyze usage trends, and automate resource provisioning to prevent outages, over-provisioning, or under-provisioning. In addition, the role requires building robust data pipelines and analytical models to support forecasting and decision-making.
Key Responsibilities
· Data Pipeline Development: Design and maintain ETL/ELT pipelines to collect, transform, and store infrastructure usage data.
· Data Modeling: Build models to analyze system metrics and predict future resource needs.
· Demand Forecasting: Analyze historical usage patterns to predict CPU, memory, and storage requirements.
· Load Testing & Scaling: Simulate traffic spikes to identify bottlenecks and ensure systems scale linearly.
· Cost Efficiency: Optimize resource allocation to avoid unnecessary costs while maintaining service availability.
· Automation: Use Infrastructure as Code (IaC) tools like Terraform to automate scaling and provisioning.
· Architecture Review: Collaborate with software teams to flag single points of failure and ensure resilient service design.
Tools & Techniques
· Monitoring & Observability: Prometheus, Grafana, Datadog.
· Chaos Engineering: Test system resilience under stress.
· Infrastructure as Code: Terraform, Ansible, Harness.
· Data Engineering Stack: SQL, Python, Spark, Airflow for data processing and orchestration.
Qualifications
· Strong background in data engineering and SRE practices.
· Hands-on experience with capacity planning, forecasting, and scaling.
· Proficiency in IaC tools (Terraform, Ansible, Harness).
· Experience with data pipelines, ETL/ELT frameworks, and big data tools.
· Familiarity with monitoring/observability platforms (Prometheus, Grafana, Datadog).
· Knowledge of chaos engineering and resilience testing.
· Excellent collaboration and communication skills.

Source: Programmers.io careers (LinkedIn)