Anthropic · San Francisco · Hybrid

Research Scientist, Interpretability

11/8/2025

Description

  • Develop methods for understanding LLMs by reverse engineering algorithms learned in their weights

  • Design and run robust experiments, both quickly in toy scenarios and at scale in large models

  • Create and analyze new interpretability features and circuits to better understand how models work.

  • Build infrastructure for running experiments and visualizing results

  • Work with colleagues to communicate results internally and publicly

Qualifications

  • Have a strong track record of scientific research (in any field), and have done some work on Interpretability

  • Enjoy team science – working collaboratively to make big discoveries

  • Are comfortable with messy experimental science. We're inventing the field as we work, and the first textbook is years away

  • You view research and engineering as two sides of the same coin. Every team member writes code, designs and runs experiments, and interprets results

  • You can clearly articulate and discuss the motivations behind your work, and teach us about what you've learned. You like writing up and communicating your results, even when they're null

  • This role is based in San Francisco office; however, we are open to considering exceptional candidates for remote work on a case-by-case basis.

Benefits

$315,000 - $560,000 USD

Application

View listing at origin and apply!