Dice is the leading career destination for tech experts at every stage of their careers. Our client, Value Technology Inc, is seeking the following. Apply via Dice today!
Job Title: Data Engineer – Data Lake Migration
Location: Dallas, TX (Onsite)
Experience: 8-10 years
Role Summary
Value Technology is seeking a Data Engineer to join a high-impact datastore migration initiative, focused on migrating data from on-premise Data Lakes to an AWS-based Lakehouse architecture. This role involves end-to-end data pipeline migration, transformation of legacy consumption patterns, and ensuring data quality and integrity across modern data platforms.
Key Responsibilities
Data Migration & Pipeline Engineering
- Refactor and migrate data pipelines, extraction logic, and job scheduling from legacy systems to modern Lakehouse architecture.
- Execute large-scale data transfers ensuring accuracy, completeness, and consistency.
- Work with file formats such as JSON, Avro, and Parquet for efficient data processing.
Consumption Pattern Migration
- Convert and optimize legacy SQL and Apache Spark-based workloads for modern platforms.
- Migrate and adapt datasets to Snowflake and Apache Iceberg environments.
- Analyze existing data usage patterns to design optimized data delivery solutions.
Data Quality & Reconciliation
- Perform data validation and reconciliation to ensure migrated data matches production standards.
- Build and utilize reconciliation frameworks to validate data correctness and completeness.
Stakeholder Collaboration
- Act as a technical liaison between engineering teams and business stakeholders.
- Drive data hand-off and sign-off processes, ensuring alignment with business requirements.
- Provide regular updates and participate in stakeholder discussions.
Platform & Integration
- Collaborate with internal data platform teams to adopt new tools, workflows, and frameworks.
- Work with distributed systems and data storage frameworks such as Hadoop (HDFS/Hive).
Required Technical Skills
Programming & Data Processing
- Strong proficiency in Python or Java
- Hands-on experience with Apache Spark
- Strong expertise in ANSI SQL
Data Platforms & Tools
- Experience with:
- Snowflake
- Apache Iceberg
- Hadoop (HDFS, Hive)
- Kafka (data streaming)
- Sybase IQ
Data Formats & Integration
- Knowledge of JSON, Avro, Parquet
- Experience with data ingestion mechanisms (e.g., FTP)
DevOps & Methodology
- Strong understanding of SDLC and CI/CD practices
- Experience with Kubernetes (K8s) deployments
Core Data Engineering Competencies
- Temporal Data Modeling: Handling historical data and SCD (e.g., SCD Type 2)
- Schema Management: Schema evolution and enforcement (especially with Iceberg)
- Performance Optimization: Partitioning, clustering, and query tuning
- Data Architecture:
- Normalization vs. Denormalization
- Natural vs. Surrogate Keys
Additional Skills
- Strong troubleshooting and debugging skills (SQL & pipelines)
- Ability to quickly learn new tools, frameworks, and workflows
- Experience working with large-scale, distributed data systems