Job Title: AI/ML Engineer
Location: Frisco, TX/ Atlanta, GA/ Bellevue, WA (Onsite from Day 1) (Need Only Locals)
Looking for W2 No C2C
Job Description:
- We are seeking an AI/ML Engineer to build the intelligent systems that power identity resolution and data accessibility within our Customer Data Platform (CDP) - the authoritative source of truth for customer data across the entire US adult population.
- This role focuses on developing machine learning pipelines that deduplicate, link, and resolve customer identities across disparate data sources - the core capability that transforms raw data into trusted, unified customer profiles. You will also contribute to LLM-based solutions that enable natural language querying of CDP data, making the platform accessible to business users across the organization.
- You will work on both classical ML techniques and modern LLM-based approaches to ensure that every customer identity in CDP is accurately resolved, every profile is trustworthy, and every user can access the data they need.
Job Responsibilities:
- Develop and deploy entity resolution models to match and deduplicate customer records across multiple systems - directly impacting the accuracy of CDP as the source of truth
- Implement probabilistic matching techniques (e.g., Fellegi-Sunter) and ML models (gradient boosting, neural classifiers) for record linkage across the US adult population
- Build candidate blocking pipelines using phonetic algorithms (Soundex, Double Metaphone), token similarity, and LSH to handle billions of potential match pairs efficiently
- Apply fuzzy matching techniques (Levenshtein, Jaro-Winkler, Jaccard) for customer attributes such as name, address, phone, and identifiers
- Develop clustering algorithms (DBSCAN, hierarchical clustering) to create unified "golden customer profiles" that serve as the authoritative representation of each individual
- Build embedding-based similarity systems using Sentence-BERT or transformer-based models for semantic matching
- Implement ANN/KNN retrieval systems (FAISS, Annoy) for large-scale entity matching across population-scale datasets
Job Responsibilities - AI/LLM:
- Use LLMs (e.g., GPT, Claude) for classification and disambiguation of entity matches, improving resolution accuracy where traditional methods fall short
- Build and support RAG pipelines to enrich customer profiles with contextual data from unstructured sources
- Perform prompt engineering and evaluation for structured data extraction from unstructured inputs feeding into CDP
- Contribute to NLQ-to-SQL systems, enabling business users to query CDP data using natural language - making the authoritative source of truth accessible to non-technical stakeholders
- Support integration with vector databases (e.g., Pinecone, PGVector, Qdrant) for semantic search across customer data
Education and Work Experience:
- Bachelor's or master's degree in computer science, Data Science, or related field
- 3+ years of experience in ML/AI engineering
- At least 1 year of experience in entity resolution, record linkage, or deduplication - ideally at scale
Technical Skills:
- Programming: Python (required)
- Libraries: scikit-learn, HuggingFace Transformers, RapidFuzz, jellyfish
- Experience with LLM APIs (OpenAI, Anthropic) and prompt pipelines
- Strong SQL skills and experience with Spark or Dask for distributed processing
- Familiarity with vector databases and embedding-based retrieval
- Experience with ML lifecycle tools (MLflow or similar)
- Understanding of data quality metrics and how identity resolution impacts downstream trust
Knowledge, Skills, and Abilities:
- Strong understanding of ML fundamentals and similarity matching techniques applied to customer identity
- Ability to work with large, messy, real-world datasets spanning hundreds of millions of records
- Understanding of precision/recall tradeoffs in identity resolution and their impact on data trust
- Good problem-solving and analytical skills
- Ability to collaborate with data engineering, platform, and business teams to deliver accurate customer profiles
Best Regards:
Tanuja P
Phone:
Email: