Logan Riggs Smith

Area: Mechanistic Interpretability
- Interpretable Architectures (both from scratch and post-training)
- Weight Based Interpretability (e.g. Apollo's Attribution-based Parameter Decomposition)
- Better SAEs
- Fully Interpreting MLPs
Been trying to reduce x-risk since 2018, and started researching mech interp 2 years ago. I was a core contributor of the first SAE paper, applied SAEs to preference models, and co-authored the Dark Matter of SAEs paper.

Mech interp is luckily a verifiable task which 2026 LLMs might be able to automate; however, there is still the core question of finding a fundamental unit of computation. SAE features are great but not sufficient.

Besides research, I enjoy meditating, dancing, and improv piano.
Comfortable with training basic ML models, reading papers, and trying to falsify your own results. For a 9-week program, I expect it to be better to quickly de-risk (and falsify) 5 different projects and invest a month into the most promising one, both for personal growth and ending up with a more impactful project.

Independent Researcher