Research Scientist, Learning & Cognitive Outcomes

SquareMileConsulting

London, GB

Job Description

Careers

Growth - London, UK and New York City

About the Role

As a Research Scientist focused on Learning & Cognitive Outcomes, you will help build the scientific and evaluation infrastructure needed to understand how AI systems affect learning, cognition, and capability development over time.

We are looking for someone who can design rigorous studies, develop scalable evaluation methods, and help answer a central question:

do AI systems help people become more capable over time?

This means going beyond engagement, satisfaction, or task completion to measure whether users develop better reasoning, stronger metacognition, greater autonomy, deeper understanding, improved transfer, and more durable skills.

This role sits at the intersection of learning science, cognitive science, experimental design, LLM evaluation, and applied product research. You will help develop cognitive outcome measures, design and manage RCTs and field studies, build classifiers and graders, guide external research partners, and translate findings into model and product improvements.

The initial focus of this work will include young users and education settings, while contributing to a broader research agenda on how AI affects cognition and capability development across populations. You should be comfortable working with schools, universities, education systems, research organizations, and other external partners, while also collaborating closely with internal product, research, engineering, data science, and policy teams.

This is an applied, empirical role. It is not a traditional academic research role optimized primarily for publication, nor is it a curriculum design or production engineering role. Success means building evidence systems that are scientifically credible, operationally useful, and influential in how models and products are developed.

A strong candidate will be able to move quickly in ambiguous environments, make pragmatic scientific tradeoffs, and maintain high standards while working with messy real-world data, external partners, and fast-moving AI systems.

We expect you to:

Have strong grounding in learning science, cognitive science, educational psychology, behavioral science, HCI, or a related empirical field, with a clear understanding of how people acquire, retain, transfer, and apply knowledge and skills.

Have experience designing and executing rigorous empirical research, including RCTs, field experiments, large-scale behavioral studies, or other causal evaluation methods.

Be able to design studies that measure meaningful cognitive and learning outcomes, not just engagement, preference, completion, or short-term performance.

Build and validate evaluation systems for learning and cognitive outcomes, including rubrics, classifiers, graders, benchmarks, behavioral metrics, and model-based evaluators.

Develop methods for detecting both positive and negative effects of AI use, including improved reasoning, better metacognition, durable learning, transfer, overreliance, shallow fluency, answer-copying, reduced agency, or unproductive cognitive offloading.

Be technically fluent enough to work with data directly, prototype analyses, inspect model outputs, reason about classifier and grader performance, and collaborate effectively with data scientists, engineers, and research teams.

Understand the practical strengths and limitations of LLM-based evaluation methods, including model-as-judge systems, rubric design, validation, calibration, inter-rater reliability, and precision/recall tradeoffs.

Help design, launch, and manage external RCTs and field studies with partners such as schools, universities, education systems, research groups, vendors, and other institutions.

Guide external research partners on study design, protocol quality, measurement strategy, implementation fidelity, analysis plans, and interpretation of results.

Operate independently in ambiguous environments, turning broad research goals into concrete study designs, execution plans, evaluation artefacts, and decision-relevant outputs.

Communicate clearly with technical, scientific, partner, and executive audiences, including through internal memos, research reports, partner guidance, protocols, presentations, and external publications.

Translate research findings into actionable recommendations for model behavior, product design, evaluation standards, and future research priorities.

Move quickly while maintaining scientific rigor, especially in real-world settings with imperfect data, operational constraints, and multiple stakeholders.

Represent OpenAI credibly and responsibly in partner-facing research conversations, while knowing when to escalate scientific, operational, ethical, or strategic judgement calls.

Be excited about OpenAI’s approach to research and deployment, especially the opportunity to study and improve the effects of AI systems on human capability at scale.

Skills & Requirements

Technical Skills

Learning scienceCognitive scienceEducational psychologyBehavioral scienceHciLlm evaluationRctsField studiesClassifiersGradersLlm-based evaluation methodsRigorousEmpiricalScientificPragmaticHigh standardsReal-world dataExternal partnersFast-movingAmbiguityTradeoffsScientific rigorAi systemsLearningCognitionCapability developmentEducationResearchProduct development

Level

senior

Posted

5/5/2026

Apply Now

You will be redirected to SquareMileConsulting's application portal.

Find Similar Jobs

Browse roles in the same category, level, and remote setup.