Logan Riggs Smith

  • Area: Mechanistic Interpretability

    • Interpretable Architectures (both from scratch and post-training)

    • Weight Based Interpretability (e.g. Apollo's Attribution-based Parameter Decomposition)

    • Better SAEs

    • Fully Interpreting MLPs

  • Been trying to reduce x-risk since 2018, and started researching mech interp 2 years ago. I was a core contributor of the first SAE paper, applied SAEs to preference models, and co-authored the Dark Matter of SAEs paper.

    Mech interp is luckily a verifiable task which 2026 LLMs might be able to automate; however, there is still the core question of finding a fundamental unit of computation. SAE features are great but not sufficient.

    Besides research, I enjoy meditating, dancing, and improv piano.

  • Comfortable with training basic ML models, reading papers, and trying to falsify your own results. For a 9-week program, I expect it to be better to quickly de-risk (and falsify) 5 different projects and invest a month into the most promising one, both for personal growth and ending up with a more impactful project.

Independent Researcher