AI Data Infrastructure Engineer

VirtualVocations

Los Angeles, US

Remote

Why this role

Pace

Steady

Collaboration

Medium

Autonomy

Medium

Decision Impact

Team

Role Level

Individual Contributor

Derived from job-description analysis by Serendipath's career intelligence engine.

What success looks like

successful deployment of large-scale data pipelines
high data quality and integrity

Typical background

6+ years of data engineering experiencedegree in Computer Science or related field

Transferable backgrounds

Coming from data engineering
Coming from AI infrastructure

Skills & requirements

Required

Large-scale Data SystemsAI Training And Evaluation PipelinesData CleaningPetabyte-scale Storage

Preferred

Data VisualizationCloud Computing

Stack & domain

PythonJvm Or Systems LanguageSparkRayBeamPetabyte-scale Storage And Pipeline SystemsAIData InfrastructureData EngineeringMachine LearningData Processing FrameworksPetabyte-scale Storage

About the role

Original posting from VirtualVocations

AI Data Infrastructure Engineer, a full-time remote position requiring over six years of experience, focused on building and operating large-scale data systems for AI training and evaluation pipelines. Key Responsibilities Design and operate large-scale data pipelines supporting AI training and evaluation workflows Build ingestion systems for various data modalities including text, image, and audio Implement data cleaning and quality assurance processes at petabyte scale Required Qualifications Bachelor's or Master's degree in Computer Science or a related field Six or more years of data engineering experience, particularly with ML or AI workloads Strong proficiency in Python and at least one JVM or systems language Deep experience with modern data processing frameworks such as Spark, Ray, or Beam Hands-on experience with petabyte-scale storage and pipeline systems

Source: VirtualVocations careers