Talk:Mechanistic interpretability

Creating the Mechanistic Interpretability article

I started this page after consulting with the meta-page XAI/Interpretability in Talk:Explainable artificial intelligence#Mechanistic Interpretability. I believe that mech interp is a sufficient distinct movement and field of study, with a growing body of work (see main article), such that it warrants a separate page. I plan to continuously improve upon and maintain this page, and will respond to feedback. Thank you! JoNeedsSleep (talk) 03:28, 12 May 2025 (UTC)[reply]

Requested move 12 May 2025

The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review after discussing it on the closer's talk page. No further edits should be made to this discussion.

The result of the move request was: Speedy close as moved; agreed to be a non-controversial "technical" request. (closed by non-admin page mover) —⁠ ⁠BarrelProof (talk) 21:22, 12 May 2025 (UTC)[reply]

Mechanistic Interpretability → Mechanistic interpretability – titles should be in sentence case (see MOS:TITLECAPS). Alenoach (talk) 08:03, 12 May 2025 (UTC)[reply]

Agree. I don't see why this would be controversial. WeyerStudentOfAgrippa (talk) 14:32, 12 May 2025 (UTC)[reply]

Yes, it's not controversial indeed, but there is already a redirect page named "Mechanistic interpretability", so renaming it requires special permissions. Alenoach (talk) 14:58, 12 May 2025 (UTC)[reply]

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Wiki Education assignment: Computation, Culture, and Society

This article was the subject of a Wiki Education Foundation-supported course assignment, between 25 March 2025 and 31 May 2025. Further details are available on the course page. Student editor(s): JoNeedsSleep (article contribs).

— Assignment last updated by Laurelli7 (talk) 19:03, 30 May 2025 (UTC)[reply]

Just curious

@JoNeedsSleep: Glad to see this page exists. Just curious, was any of the content from User:AryamanA/Draft:Mechanistic interpretability? (Don't really care if it was or wasn't, just glad if it was useful!) Aryaman^A ^{(talk, contribs)} 01:34, 1 July 2025 (UTC)[reply]

Hi Aryaman, appreciate the message! I didn’t see this sandbox page when I created the page - it would’ve been very helpful. Just had a look at your draft; your techniques section, especially the extensive entry on SAEs and its evals, would be a great addition to the current methods section. JoNeedsSleep (talk) 17:07, 1 July 2025 (UTC)[reply]

Awesome, yes I'll be contributing to the page as I have time. I've merged in the new stuff I had (mainly SAEs). Aryaman^A ^{(talk, contribs)} 22:07, 1 July 2025 (UTC)[reply]