Job Responsibilities - Identity Resolution
- Design and lead end-to-end identity resolution architecture, combining probabilistic models, ML, and embedding-based techniques to build the authoritative customer identity graph
- Build and optimize large-scale entity matching systems across billions of records and multiple data domains - ensuring every US adult is accurately represented in CDP
- Architect advanced candidate generation and blocking strategies (LSH, phonetic encoding, semantic similarity) that balance precision with computational feasibility at population scale
- Design high-precision matching pipelines using ensemble approaches (rules + ML + LLM-based validation) to maximize accuracy of golden customer profiles
- Develop scalable clustering and graph-based approaches for unified customer identity resolution with clear confidence scoring and auditability
- Lead implementation of embedding pipelines and similarity search systems using transformer models for semantic-level identity matching
Job Responsibilities - AI/LLM
- Architect and build LLM-powered systems for entity resolution, including zero-shot and few-shot classification workflows that handle edge cases traditional models miss
- Design and implement RAG-based architectures for enriching and contextualizing customer data from unstructured sources
- Lead development of NLQ-to-SQL platforms, enabling business users to query CDP - the authoritative source of truth - using natural language
- Translate ambiguous business questions into structured queries with schema awareness, semantic layers, and guardrails that protect data integrity
- Define best practices for prompt engineering, evaluation, and LLM observability - ensuring AI outputs meet the trust standards CDP demands
- Design and optimize vector search architectures (Pinecone, Qdrant, pgvector) for large-scale retrieval across customer data
- Evaluate and integrate emerging frameworks such as LangChain, LangGraph, and agentic workflows where they strengthen CDP capabilities
Education and Work Experience
- Bachelor's or Master's degree in Computer Science, Data Science, or related field
- 6+ years of experience in ML/AI engineering
- Proven experience building production-grade entity resolution or identity graph systems at scale
- Experience designing LLM-based applications in enterprise environments with high accuracy and trust requirements
Technical Skills
- Advanced programming: Python
- Deep expertise in ML algorithms for similarity, classification, and clustering - particularly in identity resolution contexts
- Strong experience with transformer models, embeddings, and semantic search at population scale
- Hands-on experience with LLM APIs and orchestration frameworks
- Strong SQL and experience with distributed data processing (Spark, Dask)
- Experience with vector databases and ANN search systems (FAISS, Pinecone, etc.)
- Expertise in ML lifecycle management (MLflow or equivalent)
- Understanding of data governance, privacy, and security requirements for customer identity data
Knowledge, Skills, and Abilities
- Strong system design and architectural thinking for AI/ML systems at population scale
- Ability to balance precision, recall, and scalability in identity resolution systems - understanding that accuracy directly impacts CDP's authority as the source of truth
- Strong understanding of data semantics and customer domain modeling across diverse data sources
- Leadership in driving AI engineering best practices, standards, and quality benchmarks
- Ability to collaborate across data engineering, product, security, and business teams to deliver trusted customer intelligence