Lead and drive the full data engineering lifecycle, including architectural design, conceptualization, data modeling, implementation, and operational management of complex data systems.
Architect, build, and maintain robust, highly scalable data infrastructure, including real-time and batch ETL/ELT data pipelines, utilizing orchestration tools like Apache Airflow.
Drive the development and continuous improvement of the organization's lakehouse and data warehouse by designing advanced data models and applying modern data transformation practices with tools like dbt.
Design and automate scalable data processes by integrating diverse systems, APIs, and third-party services to ensure fault-tolerant data flow and synchronization.
Define and enforce best practices for data governance, quality, and lineage, while establishing comprehensive monitoring and alerting frameworks to ensure system reliability at scale.
Partner with cross-functional technical and business teams to translate complex business requirements into scalable technical specifications and robust production-ready solutions.
Architect, deploy, and maintain internal automation solutions and data products. Integrating AI or machine learning components into these solutions to drive efficiency is a strong plus.
Mentor junior data engineers, conduct code reviews, and provide technical leadership to elevate the team's data engineering practices.
Any ad hoc duties as assigned.
Job Requirements:
Bachelor's or Master's degree in Computer Science, Software Engineering, or related field is preferred.
Proven track record and deep understanding of software engineering principles, scalable system architecture, and object-oriented design, with significant experience building enterprise-grade data applications will be a bonus.
Expert-level proficiency in Python, including extensive experience with data manipulation libraries (e.g., Pandas, NumPy) and frameworks for web scraping (e.g., Selenium). Experience with AI/ML frameworks (e.g., TensorFlow, PyTorch) is a plus.
Advanced experience designing and deploying complex data pipelines using workflow orchestration tools like Apache Airflow.
Deep expertise with data transformation tools like dbt and a mastery of modern ETL/ELT design patterns.
Extensive experience with advanced data modeling techniques (conceptual, logical, physical) and data warehouse schemas (e.g., Star, Snowflake) for large-scale datasets.
Significant hands-on experience and architectural knowledge of cloud platforms (GCP, AWS, or Azure).
Highly proficient in SQL with extensive experience in designing, tuning, and optimizing relational databases and modern cloud data warehouses (e.g., BigQuery, Redshift, Snowflake).
Strong practical experience with containerization (Docker, Kubernetes) and designing robust CI/CD pipelines.
Experience in building AI agents, implementing GenAI solutions in production (AIOps), or developing solutions using Generative AI frameworks (e.g., ADK, LangChain, LlamaIndex) is a advantages.
Excellent communication and leadership skills, with the ability to articulate complex technical concepts clearly to diverse stakeholders and executive teams.
A highly proactive and analytical approach to solving complex problems, combined with meticulous attention to detail.
Strong sense of ownership and autonomy, with a proven ability to independently lead major technical initiatives, manage complex projects, and deliver high-impact results.
Skills & Requirements
Technical Skills
PythonPandasNumpySeleniumTensorflowPytorchApache airflowDbtBigqueryRedshiftSnowflakeDockerKubernetesSqlGoogle cloud platformAwsAzureEpic electronic health record (ehr) systemCommunicationLeadershipData engineeringAiMachine learning