Databricks Engineer – Boston, MA - Contract
Core Responsibilities
- Data Pipeline Development: Building and optimizing ETL/ELT pipelines using Apache Spark on Databricks.
- Data Lakehouse Management: Designing and maintaining scalable data Lakehouse architectures.
- Integration: Connecting Databricks with cloud services (Azure, AWS, GCP) and external data sources.
- Performance Tuning: Optimizing Spark jobs for speed and cost efficiency.
- Collaboration: Working with data scientists, analysts, and business stakeholders to deliver usable datasets.
Key Skills
- Apache Spark (PySpark, Scala, or SQL)
- Databricks Platform (clusters, notebooks, Delta Lake)
- Cloud Services (Azure Data Factory, AWS Glue, GCP BigQuery)
- Data Modeling (star schema, snowflake, lakehouse concepts)
- Version Control & CI/CD (Git, DevOps pipelines)