WritingAnthropicAnthropicpublished Sep 14, 2022seen 2d

Toy Models Of Superposition

Open original ↗

Captured source

source ↗
published Sep 14, 2022seen 2dcaptured 8hhttp 200method plain

Toy Models of Superposition \ Anthropic Interpretability Research Toy Models of Superposition Sep 14, 2022 Read Paper

Abstract In this paper, we use toy models — small ReLU networks trained on synthetic data with sparse input features — to investigate how and when models represent more features than they have dimensions. We call this phenomenon superposition. When features are sparse, superposition allows compression beyond what a linear model would do, at the cost of "interference" that requires nonlinear filtering.

Related content

Paving the way for agents in biology Read more Making Claude a chemist Read more Coding agents in the social sciences Results from a survey of 1,260 social scientists about AI and coding agent use. Read more