Media Summary: In this AI Research Roundup episode, Alex discusses the paper: ' This has been my favorite video so far to make! I think interpretability is so important both in terms of ensuring safe AI and also ... One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ...
Sanity Checks For Llm Sparse Autoencoders - Detailed Analysis & Overview
In this AI Research Roundup episode, Alex discusses the paper: ' This has been my favorite video so far to make! I think interpretability is so important both in terms of ensuring safe AI and also ... One of the core roadblocks to understanding the computation inside a transformer is the fact that individual neurons do not seem ... I made a video about one of my favorite papers! I hope you enjoy :) ===Summary=== "Applying Warning: This is an ad-libbed talk, and I'm sure I got some facts wrong. This is a talk I gave to my MATS 9.0 training program on ... In this AI Research Roundup episode, Alex discusses the paper: 'A Mechanistic Investigation of Supervised Fine Tuning' This ...
A visual explanation of how transformers piece concepts together, told in the style of 3Blue1Brown. Introducing SAEs. What truly ... Interpreting Reasoning Features in LLM via Sparse Autoencoders Andrei Galichin The paper proposes a method to identify and interpret the directions in activation space of neural networks, addressing the issue ...