Lecture 60 Optimizing Linear Attention

May 23, 2026

Media Summary: Transformers are notoriously resource-intensive because their self- Songlin Yang, the author of the influential Flash Andrew Gordon Wilson (New York University) ...

Lecture 60 Optimizing Linear Attention - Detailed Analysis & Overview

Transformers are notoriously resource-intensive because their self- Songlin Yang, the author of the influential Flash Andrew Gordon Wilson (New York University) ... For more information about Stanford's online Artificial Intelligence programs visit: This Xiang Cheng (Massachusetts Institute of Technology) ... The professional version of this graduate course, XCS224N Natural Language Processing with Deep Learning, runs June ...

Instructor: Mary Letey Date: April 11, 2025 AI Theory Seminar: ... fastweights Transformers are dominating Deep Learning, but their quadratic memory and compute ...