Lead Applied Scientist, Document Understanding

Thomson Reuters
Washington, US
Remote

Job Description

Job Description:

  • Design and deploy semantic chunking models for lengthy, non-uniformly structured legal documents with adjustable granularity across use cases
  • Build document enrichment systems using legal and customer-defined taxonomies
  • Develop LLM-based knowledge graph construction pipelines that extract and link citations, entities, and legal concepts across diverse legal content
  • Lead knowledge distillation efforts to compress large models into latency-constrained, production-ready SLMs
  • Design evaluation frameworks — component-level and end-to-end — using expert annotation and synthetic data
  • Own technical decisions on architecture, chunking strategy, classification approach, and knowledge extraction methods
  • Partner with engineering on delivery, reliability, and scale across multiple product lines
  • Provide technical input to senior leadership on AI strategy and roadmap
  • Mentor applied scientists and ML practitioners on the team

Requirements:

  • PhD in Computer Science, AI, NLP, or a related field — required
  • 8+ years of post-degree industry experience shipping document understanding, information extraction, or knowledge graph systems into production — not research-only experience
  • Publications at ACL, EMNLP, ICLR, NeurIPS, SIGIR, KDD, or equivalent
  • Production Python and experience with PyTorch, Hugging Face Transformers, and DeepSpeed
  • Hands-on production depth required in: - Document layout analysis and semantic chunking beyond fixed-size or paragraph-based methods - Hierarchical, multi-label document classification with domain-specific and customer-defined schemas - Entity recognition and linking, relation extraction, citation parsing, and knowledge graph construction from unstructured text - LLM-based information extraction, few-shot and multi-task learning, and post-training - Knowledge distillation, model compression, and SLM deployment under latency constraints - Synthetic data generation and annotation workflow design - End-to-end evaluation framework design for document understanding

Benefits:

  • Health insurance
  • Retirement savings
  • Flexible vacation
  • Two company-wide Mental Health Days off
  • Access to the Headspace app
  • Tuition reimbursement
  • Employee incentive programs
  • Resources for mental, physical, and financial wellbeing

Skills & Requirements

Technical Skills

PythonPytorchHugging face transformersDeepspeedDocument understandingInformation extractionKnowledge graph

Level

lead

Posted

4/12/2026

Apply Now

You will be redirected to Thomson Reuters's application portal.