Job Summary:
We are seeking a Senior Data Engineer with strong hands-on experience in Databricks, PySpark, and Spark-based big data processing to design, develop, and maintain scalable data pipelines and data platforms. The role involves working with large-scale structured and unstructured datasets, building robust ETL pipelines, and supporting data science and analytics initiatives. The candidate will also mentor junior engineers and collaborate with cross-functional teams to deliver data-driven insights.
Key Responsibilities
Data Engineering & Pipeline Development
- Design and develop scalable data pipelines using Databricks, PySpark, and Spark SQL.
- Build pipelines to ingest, clean, transform, and aggregate data from multiple heterogeneous sources.
- Implement and maintain Delta Lake, Delta Live Tables, and Databricks notebooks for efficient data processing.
- Develop high-performance ETL/ELT workflows for large-scale datasets.
Big Data Processing
- Work with Big Data technologies such as Hadoop, Spark, Kafka, Hive, HDFS, and cloud platforms.
- Implement high-volume stream processing solutions using Apache Kafka and Spark Streaming.
- Ensure efficient data storage and retrieval for analytics, machine learning, and reporting.
Data Quality & Governance
- Define and implement data validation, quality checks, and normalization procedures.
- Develop data policies, retention models, and anonymization frameworks.
- Maintain governance standards for secure and reliable data access.
Data Modeling & Analytics Support
- Work closely with Data Science and Business Intelligence teams to design data models.
- Prepare datasets for analytics, BI reporting, machine learning, and advanced insights generation.
- Support visualization platforms such as Tableau, Power BI, Spotfire, or OAC.
Collaboration & Leadership
- Engage with business teams to gather requirements and design data solutions.
- Lead and mentor junior data engineers and provide guidance on best practices.
- Collaborate across projects to provide data engineering expertise and strategic insights.
Required Skills & Experience
- 7+ years of overall IT experience.
- 5+ years of experience in Data Engineering or ETL development.
- Strong hands-on experience with Databricks and PySpark.
- Expertise in Apache Spark, Spark SQL, and big data frameworks.
- Experience with Delta Lake, Unity Catalog, and Databricks Notebooks.
- Strong SQL skills with ability to write intermediate-to-advanced queries.
- Experience building data ingestion pipelines and large-scale data architectures.
- Familiarity with Agile methodologies (Scrum, Kanban, SAFe).
Preferred Skills
- Programming in Python and/or Scala.
- Experience with cloud platforms (AWS, Azure, or GCP).
- Experience with Kafka, Amazon MSK, IBM MQ, or Tibco EMS messaging systems.
- Experience with databases such as Databricks, Teradata, DB2, BigQuery, or Mainframe systems.
- Knowledge of serverless cloud technologies (S3, Lambda, Glue, Kinesis).
- Experience with Git-based version control systems.
Competencies
- Strong expertise in Databricks and Spark ecosystem.
- Ability to work with large-scale distributed data systems.
- Excellent problem-solving and analytical skills.
- Strong communication and leadership capabilities.