Job title: Lead Data Engineer Databricks
Duration: Contract to hire
Job Location: Remote
Job Description:
- We are seeking a skilled Lead Data Engineer Databricks to contribute to transformative enterprise data platform projects focused on developing data pipelines and logic engines to manage ingest, staging, and multi-tier data product modeling.
- Additionally, this includes but is not limited to data enrichment using various OEM-specific data warehouse and data lake house platform implementations for consumption via analytics clients.
- This role requires full life cycle design, build, deployment and optimization data products for multiple large enterprise industry vertical-specific implementations by processing datasets through a defined series of logically conformed layers, models, and views.
Role & Responsibilities:
- Collaborate in defining the overall architecture of the solution. This includes knowledge of modern Enterprise Data Warehouse and Data Lakehouse architectures that implement Medallion or Lamda architectures
- Design, develop, test, and deploy processing modules to implement data-driven rules using SQL, Stored Procedures, and Pyspark.
- Understands and owns data product engineering deliverables relative to a CI-CD pipeline and standard devops practices and principles
- Build and optimize data pipelines on platforms like Databricks, SQL Server, or Azure Data Fabric.
Hard Skills - Must have:
- Current knowledge of an using modern data tools like (Databricks,FiveTran, Data Fabric and others); Core experience with data architecture, data integrations, data warehousing, and ETL/ELT processes
- Applied experience with developing and deploying custom whl and or in session notebook scripts for custom execution across parallel executor and worker nodes
- Applied experience in SQL, Stored Procedures, and Pysparkbased on area of data platform specialization.
- Strong knowledge of cloud and hybrid relational database systems, such as MS SQL Server, PostgresSQL, Oracle, Azure SQL, AWS RDS, Auroraor a comparable engine.
- Strong experience with batch and streaming data processing techniques and file compactization strategies.
Hard Skills - Nice to have/It's a plus:
- Automation experience with CICD pipelines to support deployment and integration workflows including trunk-based development using automation services such as Azure DevOps, Jenkins, Octopus.
- Advanced proficiency in Pyspark for advanced data processing tasks.
- Advance proficiency in spark workflow optimization and orchestration using tools such as Asset Bundles or DAG (Directed Acyclic Graph) orchestration.
Soft Skills / Business Specific Skills:
- Ability to identify, troubleshoot, and resolve complex data issues effectively.
- Strong teamwork, communication skills and intellectual curiosity to work collaboratively and effectively with cross-functional teams.
- Commitment to delivering high-quality, accurate, and reliable data products solutions.
- Willingness to embrace new tools, technologies, and methodologies.
- Innovative thinker with a proactive approach to overcoming challenges.