Job description
Context
Collective.work is building the next-generation AI-powered sourcing platform for recruiters. Our mission is to help talent teams identify, engage, and hire the best candidates faster through intelligent automation and data-driven insights. We operate at the intersection of data, AI, and recruiting workflows—where high-quality data infrastructure is critical to our success.
Missions
- Design and maintain scalable data pipelines (batch and real-time)
- Build and optimize ETL/ELT workflows across Azure and/or GCP
- Develop data models and architectures to support analytics and ML use cases
- Ensure data quality, integrity, and reliability across systems
- Collaborate with ML engineers to prepare and serve training datasets
- Monitor and improve pipeline performance, cost efficiency, and scalability
- Implement best practices for data governance, security, and compliance
- Contribute to tooling and infrastructure decisions
Tools & Environment
- Cloud: Azure (Data Factory, Synapse) and/or GCP (BigQuery, Dataflow)
- Data Processing: Python, SQL, Spark
- Orchestration: Airflow / Prefect
- Storage: Data lakes, warehouses
- Streaming: Kafka / PubSub (nice to have)
- DevOps: Docker, CI/CD
Working Conditions
- Flexible remote work environment
- Opportunity to work on a product at the cutting edge of AI and recruiting
- High ownership and impact from day one
- Collaborative, product-driven engineering culture
- Opportunity to shape the data foundation of a growing platform
Requirements
- 3+ years of experience in data engineering or similar role
- Strong experience with Azure and/or GCP data ecosystems
- Proficiency in Python and SQL
- Experience building scalable ETL pipelines
- Familiarity with data warehousing concepts and modeling
- Understanding of distributed systems and big data tools (e.g., Spark)
- Experience working with APIs and integrating external data sources
- Strong problem-solving skills and attention to detail