Key Responsibilities
- Design, build and unit test applications on Spark framework in Python.
- Build Python and PySpark-based applications using data from relational databases (e.g., Oracle), NoSQL databases (e.g., DynamoDB, MongoDB) and filesystems (e.g., S3, HDFS).
- Build AWS Lambda functions on Python runtime leveraging awswrangler, pandas, json, requests.
- Build PySpark-based data pipeline jobs on AWS Glue ETL or EMR Clusters.
- Build Python-based event-driven integration with Kafka Topics, leveraging Confluent libraries.
- Leverage Apache Iceberg to manage schema evolution and ACID-compliant CDC merges within the data lake.
- Design and build API services using FastAPI, understand Swagger metadata files and implement OAuth2/JWT authentication for protected endpoints.
- Build process orchestration pipelines using AWS Step Functions and EventBridge rules.
- Optimize performance for data access by choosing appropriate native Hadoop file formats (Avro, Parquet, ORC) and compression codecs.
- Deploy applications on Docker and Kubernetes containers.
- Leverage Copilot/GPT for agentic coding of the above tech stack.
- Optimize Spark performance in Hadoop via configurations around Spark Context, Spark SQL, DataFrame, and Pair RDDs.
- Setup Glue crawlers to catalog OracleDB tables, MongoDB collections and S3 objects.
- Monitor, troubleshoot and debug failures using AWS CloudWatch and Datadog.
- Resolve complex data-driven scenarios and triage production issues.
- Participate in code release and production deployment.
- Create documentation for user adoption, deployments, runbooks, and support client users for enablement or issues.
- Perform code reviews with the team and enable development of code for complex scenarios.
- Participate in the agile development process, document and communicate issues and bugs relative to data standards in scrum meetings.
- Work collaboratively with onsite and offshore teams.
- Communicate opinions to multiple teams to drive the initiative with strong leadership.
Education & Experience
- Bachelor’s Degree or equivalent in computer science or related field and minimum 10+ years of experience.
- AWS certifications: Solution Architect, Data Engineer or Data Analytics Specialty.
- Hands-on experience with Python and PySpark programming.
- Hands-on experience with AWS S3, Glue ETL & Catalog, Lambda Functions, EventBridge, Step Functions, Athena.
- Hands-on experience with Kafka integrations.
- Hands-on experience with Python pandas, requests, boto3.
- Hands-on experience in writing complex SQL queries.
- Hands-on experience using REST APIs with FastAPI or Flask.
- Hands-on experience building Agentic AI workflows.
- Preferred expertise on Snowflake, AWS Redshift & DynamoDB.
- Ability to use AWS services, predict application issues and design proactive resolutions.
- Technical coordination skills to drive requirements and technical design.
- Requires aptitude to help build skillset within organization.
Knowledge, Skills & Abilities
- Data pipelines using Python and PySpark on AWS Glue, EMR and Lambda functions.
- Develop and secure RESTful APIs (FastAPI) on Docker/EKS containers and implement OAuth2/JWT authentication for protected endpoints.
- Hands-on experience with Apache Iceberg tables for CDC and latest snapshots.
- Event-based pipelines for consuming/publishing to/from Apache Kafka/MSK.
- Lead and communicate complex technical designs and leverage Copilot/GPT for agentic coding of the above tech stack.
EXL Overview: EXL (NASDAQ: EXLS) is a leading operations management and analytics company that designs and enables agile, customer-centric operating models. For more information, visit www.exlservice.com.
EEO/Minorities/Females/Vets/Disabilities: EXL is an equal opportunity employer and will provide reasonable accommodation to those individuals who are unable to be vaccinated consistent with federal, state, and local law.