Logan Riggs Smith
-
Area: Mechanistic Interpretability
Interpretable Architectures (both from scratch and post-training)
Weight Based Interpretability (e.g. Apollo's Attribution-based Parameter Decomposition)
Better SAEs
Fully Interpreting MLPs
-
Been trying to reduce x-risk since 2018, and started researching mech interp 2 years ago. I was a core contributor of the first SAE paper, applied SAEs to preference models, and co-authored the Dark Matter of SAEs paper.
Mech interp is luckily a verifiable task which 2026 LLMs might be able to automate; however, there is still the core question of finding a fundamental unit of computation. SAE features are great but not sufficient.
Besides research, I enjoy meditating, dancing, and improv piano. -
Comfortable with training basic ML models, reading papers, and trying to falsify your own results. For a 9-week program, I expect it to be better to quickly de-risk (and falsify) 5 different projects and invest a month into the most promising one, both for personal growth and ending up with a more impactful project.
Independent Researcher