Sentara is hiring a Data Engineer!
This is a fully remote position.
Overview
We are looking for a Data Engineer to support a modern data platform built on Databricks. This role will focus on building scalable data pipelines, supporting a metadata-driven ingestion framework, and ensuring data quality and governance are embedded into the platform.
Key Responsibilities
- Develop and maintain data pipelines using PySpark and Databricks
- Work within a metadata-driven ingestion framework to onboard new datasets
- Implement data quality checks and validation rules within pipelines
- Support ingestion from file-based sources and ingestion tools (e.g., Fivetran)
- Handle schema changes, incremental loads, and file processing patterns
- Contribute to data governance practices including tagging, metadata, and lineage
- Troubleshoot and resolve pipeline failures and performance issues
- Collaborate with architects and stakeholders on data onboarding and requirements
- Follow and contribute to coding standards, reusable components, and best practices
Education
- Experience in lieu of a Bachelor’s Degree
- 3+ years of relevant experience with a degree
- 5+ years of relevant experience without a degree
Certification/Licensure
- No specific certification or licensure requirements
Experience
- Required to 3 to 5 years of relevant experience
- Hands-on experience with PySpark and Databricks
- Strong SQL skills
- Experience building ETL/ELT data pipelines
- Understanding of Delta Lake concepts (merge, schema evolution, partitions)
- Familiarity with cloud platforms (Azure preferred)
- Basic experience with Git and version control
- Exposure to data catalog or governance tools (e.g., DataHub)
- Experience with Fivetran or similar ingestion tools
- Understanding of data quality and validation concepts
- Experience working with metadata-driven frameworks
- Strong problem-solving and debugging skills
- Ability to work in a structured, framework-driven environment
- Focus on data quality, not just pipeline execution
- Willingness to learn and adapt in a fast-evolving data ecosystem