WritingAnthropicAnthropicpublished Mar 16, 2023seen 2d

Privileged Bases In The Transformer Residual Stream

Open original ↗

Captured source

source ↗

Privileged Bases in the Transformer Residual Stream \ Anthropic Interpretability Research Privileged Bases in the Transformer Residual Stream Mar 16, 2023 Read Paper

Abstract Our mathematical theories of the Transformer architecture suggest that individual coordinates in the residual stream should have no special significance (that is, the basis directions should be in some sense "arbitrary" and no more likely to encode information than random directions). Recent work has shown that this observation is false in practice. We investigate this phenomenon and provisionally conclude that the per-dimension normalizers in the Adam optimizer are to blame for the effect.

We explore two other obvious sources of basis dependency in a Transformer: Layer normalization, and finite-precision floating-point calculations. We confidently rule these out as being the source of the observed basis-alignment.

Related content

Paving the way for agents in biology Read more Making Claude a chemist Read more Coding agents in the social sciences Results from a survey of 1,260 social scientists about AI and coding agent use. Read more