Talk:Mechanistic interpretability
Creating the Mechanistic Interpretability article
I started this page after consulting with the meta-page XAI/Interpretability in Talk:Explainable artificial intelligence#Mechanistic Interpretability. I believe that mech interp is a sufficient distinct movement and field of study, with a growing body of work (see main article), such that it warrants a separate page. I plan to continuously improve upon and maintain this page, and will respond to feedback. Thank you! JoNeedsSleep (talk) 03:28, 12 May 2025 (UTC)
Requested move 12 May 2025
- The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.
The result of the move request was: Speedy close as moved; agreed to be a non-controversial "technical" request. (closed by non-admin page mover) — BarrelProof (talk) 21:22, 12 May 2025 (UTC)
Mechanistic Interpretability → Mechanistic interpretability – titles should be in sentence case (see MOS:TITLECAPS). Alenoach (talk) 08:03, 12 May 2025 (UTC)
- Agree. I don't see why this would be controversial. WeyerStudentOfAgrippa (talk) 14:32, 12 May 2025 (UTC)
- Yes, it's not controversial indeed, but there is already a redirect page named "Mechanistic interpretability", so renaming it requires special permissions. Alenoach (talk) 14:58, 12 May 2025 (UTC)
Wiki Education assignment: Computation, Culture, and Society
This article was the subject of a Wiki Education Foundation-supported course assignment, between 25 March 2025 and 31 May 2025. Further details are available on the course page. Student editor(s): JoNeedsSleep (article contribs).
— Assignment last updated by Laurelli7 (talk) 19:03, 30 May 2025 (UTC)
Just curious
@JoNeedsSleep: Glad to see this page exists. Just curious, was any of the content from User:AryamanA/Draft:Mechanistic interpretability? (Don't really care if it was or wasn't, just glad if it was useful!) AryamanA (talk, contribs) 01:34, 1 July 2025 (UTC)
- Hi Aryaman, appreciate the message! I didn’t see this sandbox page when I created the page - it would’ve been very helpful. Just had a look at your draft; your techniques section, especially the extensive entry on SAEs and its evals, would be a great addition to the current methods section. JoNeedsSleep (talk) 17:07, 1 July 2025 (UTC)
- Awesome, yes I'll be contributing to the page as I have time. I've merged in the new stuff I had (mainly SAEs). AryamanA (talk, contribs) 22:07, 1 July 2025 (UTC)