Detection And Steering In Llms Using Feature Learning

Quick Context: State-of-the-art foundation models are often seen as black boxes: we send a prompt in and we get out our - often useful - answer. This video summarizes the research by Eric Bigelow, Daniel Wurgaft, and colleagues from Goodfire AI, Harvard, NTT Research, ...

Detection And Steering In Llms Using Feature Learning -

State-of-the-art foundation models are often seen as black boxes: we send a prompt in and we get out our - often useful - answer. This video summarizes the research by Eric Bigelow, Daniel Wurgaft, and colleagues from Goodfire AI, Harvard, NTT Research, ... Modify the behavior or the personality of a model at inference time, without fine-tuning or prompt engineering.

Important details found

State-of-the-art foundation models are often seen as black boxes: we send a prompt in and we get out our - often useful - answer.
This video summarizes the research by Eric Bigelow, Daniel Wurgaft, and colleagues from Goodfire AI, Harvard, NTT Research, ...
Modify the behavior or the personality of a model at inference time, without fine-tuning or prompt engineering.
Most people think there are two ways to control an AI: write a better prompt, or fine-tune it on more data.
Eric and Wendy Schmidt Center Symposium: Biomedical Science and AI April 28 - 29, 2026 Day 1,

Why this topic is useful

This format is designed to help readers move from a broad question into more specific pages without losing context.

Frequently Asked Questions

What is this page about?

This page summarizes Detection And Steering In Llms Using Feature Learning and connects it with related entries, references, and supporting context.

Is the information always complete?

Not always. Some topics may need verification from official or primary sources.

How should readers use this information?

Use it as a starting point, then open related pages for more specific details.

Image References

Detection and Steering in LLMs using Feature Learning

Feature learning & the linear representation hypothesis for steering & monitoring LLMs

Steering vectors: tailor LLMs without training. Part I: Theory (Interpretability Series)

Steering LLM Behavior Without Fine-Tuning

Steering vectors in LLMs

A Window Into LLMs | Sparse Autoencoders Explained

Steering LLMs: How to Change AI Personality Without Fine-Tuning

LLM Interpretability - How to steer its features?

Manifold Steering: LLM Control via Geometry

How Belief Dynamics Control LLMs: ICL and Activation Steering Unified

View Full Details

Detection and Steering in LLMs using Feature Learning

Detection and Steering in LLMs using Feature Learning

Read more details and related context about Detection and Steering in LLMs using Feature Learning.

Feature learning & the linear representation hypothesis for steering & monitoring LLMs

Feature learning & the linear representation hypothesis for steering & monitoring LLMs

Eric and Wendy Schmidt Center Symposium: Biomedical Science and AI April 28 - 29, 2026 Day 1,

Steering vectors: tailor LLMs without training. Part I: Theory (Interpretability Series)

Steering vectors: tailor LLMs without training. Part I: Theory (Interpretability Series)

State-of-the-art foundation models are often seen as black boxes: we send a prompt in and we get out our - often useful - answer.

Steering LLM Behavior Without Fine-Tuning

Steering LLM Behavior Without Fine-Tuning

Modify the behavior or the personality of a model at inference time, without fine-tuning or prompt engineering. Read the blog post ...

Steering vectors in LLMs

Steering vectors in LLMs

Most people think there are two ways to control an AI: write a better prompt, or fine-tune it on more data. There's a third way ...

A Window Into LLMs | Sparse Autoencoders Explained

A Window Into LLMs | Sparse Autoencoders Explained

This has been my favorite video so far to make! I think interpretability is so important both in terms of ensuring safe AI and also ...

Steering LLMs: How to Change AI Personality Without Fine-Tuning

Steering LLMs: How to Change AI Personality Without Fine-Tuning

Read more details and related context about Steering LLMs: How to Change AI Personality Without Fine-Tuning.

LLM Interpretability - How to steer its features?

LLM Interpretability - How to steer its features?

Read more details and related context about LLM Interpretability - How to steer its features?.

Manifold Steering: LLM Control via Geometry

Manifold Steering: LLM Control via Geometry

In this AI Research Roundup episode, Alex discusses the paper: 'Manifold

How Belief Dynamics Control LLMs: ICL and Activation Steering Unified

How Belief Dynamics Control LLMs: ICL and Activation Steering Unified

This video summarizes the research by Eric Bigelow, Daniel Wurgaft, and colleagues from Goodfire AI, Harvard, NTT Research, ...