Founding Research Engineer

60x.ai

London, GB

On-site

Job Description

What we're building

Frontier models now score above 170 on IQ tests. Reasoning is no longer the constraint on enterprise AI. Context is.

The context layer sits between an enterprise's siloed data and the agents that need to act on it. Stuff the context window and you trade quality for cost and latency. Use naive RAG and retrieval breaks the moment the question gets interesting. Stand up a vanilla knowledge graph and you hit the harder problem underneath: someone has to design the ontology, and at enterprise scale (hundreds of thousands of files, hundreds of gigabytes) no human can.

This is what gates almost every enterprise AI deployment we've seen.

60x solves it. We've built AI Brain, a knowledge graph platform engineered backwards from the agentic retrieval problem. The thesis is dynamic ontology generation: the graph schema isn't authored by a user, it's generated by a multi-agent ingestion pipeline from the business logic of the data itself, and continuously enriched with secondary and tertiary derivatives. Pre-digested analysis lives in the graph so retrieval is a lookup, not a reasoning loop.

We operate a Palantir model for workflows. Platform sits at the centre. Forward-deployed engineers wrap it around enterprise workflows we've already templated. Customisations get retained as IP and feed back into the platform. Same flywheel shape as Palantir, different domain.

We work with enterprises across multiple sectors, and a growing list of global consultancies are evaluating us against their internal GPT deployments. In the last two weeks we shipped a redesigned ingestion pipeline, primary entity extraction with auto-enrichment, and an end-to-end demo across 500 companies. That pace is the default.

This is a founding role. The parts of the platform you'll work on are the parts that decide whether the thesis holds.

The role

You'll work on the research-grade core of AI Brain alongside the CTO (exited robotics founder) and the senior engineering team. The open problems on the desk:

The graph schema is generated, not authored. Structure emerges from the business logic of the data, with analytical insight pre-computed and stored rather than recomputed on every query.

Open work:

Hierarchical ontology, moving from a flat conceptual space to one with inheritance, without breaking source provenance
Per-tenant configuration that a forward-deployed engineer can tune without touching the runtime
The eval question underneath all of it: how do we measure whether a generated ontology is good?

When a single real-world entity (a company, a person, a product) appears across hundreds of documents under different names, the graph has to recognise it as one thing. We do this through a multi-stage consolidation pipeline that combines fuzzy matching, heuristics, and agent-driven tiebreaking against authoritative external sources where the domain demands it. Provenance back to the source is preserved end-to-end.

Open work:

Edge-case dedup where the same entity appears under different names in different contexts
The right boundary between consolidation, enrichment, and update as separable concerns
Determining attributes at the entity level rather than re-deriving them per chunk

Existing graph stores don't carry the temporal model we need, so we're building our own in Rust. Time becomes a property of every node, edge, and attribute, and any retrieval can be run as of any point in history.

The commercial story this opens up: a graph that doesn't only produce decisions today, but backtests its own reasoning against historical state to prove the system would have caught the right answers when it mattered. That's what justifies the platform license, and it isn't feasible on the existing stack without compromising the model.

You'll be central to the design and build of the replacement. This is the deepest research-and-systems problem on the roadmap and the most consequential piece of IP we'll ship in the next twelve months.

Existing large-context retrieval benchmarks are saturated. Frontier models score 100%, which means they no longer differentiate between systems that are good at enterprise retrieval and systems that aren't. We need a new one. Designing it, running it, and publishing the white paper is on the roadmap. Releasing the benchmark itself, separately from our results on it, is part of the strategy.

Open ideas from research conversations: alternative embedding geometries for deep hierarchies, community-detection approaches to retrieval, graph-internal continuous-monitoring patterns as an alternative to scheduled jobs, encoder-based privacy primitives that would unblock several enterprise sales cycles. You'll have a hand in picking what we commit to.

You'll also contribute to hiring, technical input on client engagements wh

Skills & Requirements

Technical Skills

AiKnowledge graphOntology generationEntity consolidationTemporal graph database

Employment Type

FULL TIME

Level

senior

Posted

5/2/2026

Apply Now

You will be redirected to 60x.ai's application portal.