AI Safety & Evaluation Technical Research Engineer

Atella
San Francisco, US
On-site

Job Description

At Atella, we are at the forefront of addressing one of the most pressing challenges in AI: the alignment drift that occurs in multi-turn, high-pressure conversations. We focus on assessing frontier models beyond the single-turn refusals by analyzing how they handle sustained pressure from personas, such as a 'frustrated senior developer.' Our innovative approach aims to build a comprehensive infrastructure to evaluate AI character and stability under pressure.

What We're Building

Co-founded by Dr. Roy Perlis, Chair of Psychiatry at Harvard/MGH and Editor of JAMA AI, Atella leads the development of advanced adversarial simulation harnesses. We utilize clinical behavioral science to create agents that exert specific psychological pressure over lengthy interactions, mathematically mapping where safety mechanisms begin to fail. Our work is critical, as it informs the safety teams at leading frontier labs leveraging our dynamic leaderboards for AI Safety and Code Security.

The Role

We are seeking a passionate Technical Research Engineer to join us in scaling STELLA, our multi-turn evaluation engine. This role merges ML research, automated red teaming, and robust software engineering to produce meaningful and impactful results.

  • Scale the Harness: Optimize and enhance our infrastructure to execute LLM-driven adversarial personas against frontier models across thousands of concurrent turns.
  • Design Adaptive Attacks: Implement innovative automated red-teaming strategies from the latest research to efficiently uncover failure modes through approaches such as multi-agent debates and dynamic prompt generation.
  • Extract Signal from Noise: Develop analysis pipelines to derive metrics from extensive raw transcripts, tracking failure-cascade probabilities, behavioral drift, and persona sensitivity.
  • Publish and Open-source: Collaborate on methodology papers and create open-source tools that contribute to the broader AI safety community.

Who You Are:

  • A proficient software engineer, skilled in writing clean and scalable Python, experienced with LLM APIs, asynchronous programming, and data pipelines.
  • Adept at deconstructing complex papers related to Constitutional AI or persona modeling, and developing working implementations swiftly.
  • Driven by a relentless curiosity to explore and discover vulnerabilities. You have a strong commitment to AI safety, preferring empirical, transcript-level evidence to theoretical discussions.
  • Bonus: Experience in RLHF, automated red teaming, or evaluating long-horizon agentic workflows would be advantageous.

Why Join Us?

  • Gain unparalleled insights into the failure modes of the world's most advanced AI systems.
  • Collaborate closely with esteemed clinical scientists from Harvard/MGH and the safety and red teams at leading frontier labs.

Compensation: $250,000-$300,000 base + 0.5%-1% equity

Skills & Requirements

Technical Skills

PythonLlm apisAsynchronous programmingData pipelinesCuriosityCommitment to ai safetyAi safetyMl researchAutomated red teamingRobust software engineering

Salary

$250,000 - $300,000

year

Employment Type

FULL TIME

Level

senior

Posted

4/23/2026

Apply Now

You will be redirected to Atella's application portal.