Polysemanticity and Superposition

Please see Neel’s post on for a more detailed description of the problems.
Polysemanticity
14
Search
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
A
4.5
Explore neuron superposition by training their absolute value model on functions of multiple variables. Make inputs binary (0/1) and look at the AND and OR of element pairs.
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
A
4.7
Adapt their ReLU output model to have a different range of feature values, and see how this affects things. Make the features 1 (i.e, two possible values)
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
A
4.1
What happens if you replace ReLU's with GeLU's in the toy models?
May 1, 2023 - Kunvar (firstuserhere)
Exploring Polysemanticity and Superposition
Studying bottleneck superposition in real language models
A
4.25
Can you find any examples of the geometric superposition configurations in the residual stream of a language model?
Exploring Polysemanticity and Superposition
Comparing SoLU/GELU
A
4.37
How do TransformerLens SoLU / GeLU models compare in Neuroscope under the SoLU polysemanticity metric? (What fraction of neurons seem monosemantic)
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
B
4.2
Replicate their absolute value model and study some of the variants of the ReLU output models.
May 4, 2023 - Kunvar (firstuserhere)
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
B
4.3
Explore neuron superposition by training their absolute value model on a more complex function like x -> x^2.
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
B
4.4
What happens to their ReLU output model when there's non-uniform sparsity? E.g, one class of less sparse features and another of very sparse
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
B
4.6
Explore neuron superposition by training their absolute value model on functions of multiple variables. Keep the inputs as uniform reals in [0, 1] and look at max(x, y)
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
B
4.8
Adapt their ReLU output model to have a different range of feature values, and see how this affects things. Make the features discrete (1, 2, 3)
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
B
4.9
Adapt their ReLU output model to have a different range of feature values, and see how this affects things. Make the features uniform [0.5, 1]
April 30, 2023; Kunvar(firstuserhere)
Exploring Polysemanticity and Superposition
Studying bottleneck superposition in real language models
B
4.21
Induction heads copy the token they attend to the output, which involves storing which of 50,000 tokens it is. How are these stored in a 64-dimensional space?
Exploring Polysemanticity and Superposition
Studying bottleneck superposition in real language models
B
4.22
How does the previous token head in an induction circuit communicate the value of the previous token to the key of the induction head? Bonus: What residual stream subspace does it take up? Is there interference?
Exploring Polysemanticity and Superposition
Studying bottleneck superposition in real language models
B
4.23
How does the IOI circuit communicate names/positions between composing heads?
Exploring Polysemanticity and Superposition
Studying bottleneck superposition in real language models
B
4.24
Are there dedicated dimensions for positional embeddings? Do any other components write to those dimensions?
Exploring Polysemanticity and Superposition
Studying neuron superposition in real models
B
4.29
Look at a polysemantic neuron in a 1L language model. Can you figure out how the model disambiguates what feature it is?
Exploring Polysemanticity and Superposition
Studying neuron superposition in real models
B
4.31
Take a feature that's part of a polysemantic neuron in a 1L language model and try to identify every neuron that represents that feature. Is it sparse or diffuse?
Exploring Polysemanticity and Superposition
Comparing SoLU/GELU
B
4.38
Can you find any better metrics for polysemanticity?
Exploring Polysemanticity and Superposition
Comparing SoLU/GELU
B
4.39
The paper speculates LayerNorm lets the model "smuggle through" superposition in SoLU models by smearing features across many dimensions and letting LayerNorm scale it up. Can you find evidence of this?
Exploring Polysemanticity and Superposition
Comparing SoLU/GELU
B
4.4
How similar are the neurons between SoLU/GELU models of the same layers?
Exploring Polysemanticity and Superposition
Confusions to study in Toy Models of Superposition
C
4.11
Can you find a toy model where GELU acts significantly differently from ReLU?
May 1, 2023 - Kunvar (firstuserhere)
Exploring Polysemanticity and Superposition
Building toy models of superposition
C
4.12
Build a toy model of a classification problem with cross-entropy loss
Exploring Polysemanticity and Superposition
Building toy models of superposition
C
4.13
Build a toy model of neuron superposition that has many more hidden features than output features
Exploring Polysemanticity and Superposition
Building toy models of superposition
C
4.14
Build a toy model that needs multiple hidden layers of ReLU's. Can computation in superposition happen across several layers? Eg max (|x|, |y|)
Exploring Polysemanticity and Superposition
Building toy models of superposition
C
4.15
Build a toy model of attention head superposition/polysemanticity. Can you find a task where the model wants to do different things with an attention head on different inputs? How does it represent things internally / deal with interference?
Exploring Polysemanticity and Superposition
Making toy model counterexamples
C
4.17
Make toy models that are counterexamples in MI. A learned example of a network with a non-linear representation.
Exploring Polysemanticity and Superposition
Making toy model counterexamples
C
4.18
Make toy models that are counterexamples in MI. A network without a discrete number of features.
Exploring Polysemanticity and Superposition
Making toy model counterexamples
C
4.19
Make toy models that are counterexamples in MI. A non-decomposable neural network.
Exploring Polysemanticity and Superposition
Making toy model counterexamples
C
4.2
Make toy models that are counterexamples in MI. A task where networks can learn multiple different sets of features.
Exploring Polysemanticity and Superposition
Studying bottleneck superposition in real language models
C
4.26
Can you find any examples of locally almost-orthogonal bases?
Exploring Polysemanticity and Superposition
Studying bottleneck superposition in real language models
C
4.27
Do language models have "genre" directions that detect the type of text, and then represent features specific to each genre in the same subspace?
Exploring Polysemanticity and Superposition
Studying neuron superposition in real models
C
4.3
Look at a polysemantic neuron in a 2L language model. Can you figure out how the model disambiguates what feature it is?
Exploring Polysemanticity and Superposition
Studying neuron superposition in real models
C
4.32
Try to fully reverse engineer a feature discovered in 4.31.
Exploring Polysemanticity and Superposition
Studying neuron superposition in real models
C
4.33
Can you use superposition to create an adversarial example for a neuron?
Exploring Polysemanticity and Superposition
Studying neuron superposition in real models
C
4.34
Can you find any examples of the asymmetric superposition motif in the MLP of a 1-2 layer language model?
Exploring Polysemanticity and Superposition
C
4.35
Pick a simple feature of language (e.g, is number, is base64) and train a linear probe to detect that in the MLP activations of a 1L language model.
Exploring Polysemanticity and Superposition
Comparing SoLU/GELU
C
4.41
How does GELU vs. ReLU compare re: polysemanticity. Replicate SoLU analysis.
Exploring Polysemanticity and Superposition
Getting rid of superposition
C
4.42
If you train a 1L/2L language model with d_mlp = 100 * d_model, does superposition go away?
Exploring Polysemanticity and Superposition
Getting rid of superposition
C
4.43
Study the T5 XXL. It's 11B params and not supported by TransformerLens. Expect major infrastructure pain.
Exploring Polysemanticity and Superposition
Getting rid of superposition
C
4.45
Pick an open problem at the end of Toy Models of Superposition.
Exploring Polysemanticity and Superposition
Building toy models of superposition
D
4.16
Build a toy model with a mdoel needs to deal with simultaneous interference, and try to understand how it does it, or if it can.
Exploring Polysemanticity and Superposition
Studying bottleneck superposition in real language models
D
4.28
Can you find examples of a model learning to deal with simultaneous interference?
Exploring Polysemanticity and Superposition
D
4.36
Look for features in Neuroscope that seem to be represented by various neurons in a 1-2 layer language model. Train probes to detect some of them. Compare probe performance vs. neuron performance.
Exploring Polysemanticity and Superposition
Getting rid of superposition
D
4.44
Can you take a trained model, freeze all weights except an MLP layer, x10 that layer's width, copy each neuron 10 times, add noise, and fine-tune? Does this remove superposition / add new features?
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.