Toy language models

Please see Neel’s sequence on for a more detailed description of the problems
Toy language model problems
14
Search
Toy Language Models
Understanding neurons
A
1.6
Hunt through Neuroscope for the toy models and look for interesting neurons to focus on.
Toy Language Models
Understanding neurons
A
1.7
Can you find any polysemantic neurons in Neuroscope? Explore this.
Toy Language Models
A
1.23
Choose your own adventure: Take a bunch of text with interesting patterns and run the models over it. Look for tokens they do really well on and try to reverse engineer what's going on!
Toy Language Models
Understanding neurons
B
1.1
How far can you get deeply reverse engineering a neuron in a 1L model? 1L is particularly easy since each neuron's output adds directly to the logits.
Toy Language Models
Understanding neurons
B
1.2
Find an interesting neuron you think represents a feature. Can you fully reverse engineer which direction should activate that feature, and compare to neuron input direction?
Toy Language Models
Understanding neurons
B
1.3
Look for trigram neurons and try to reverse engineer them. in a 1L model.(e.g, "ice cream -> sundae")
Toy Language Models
Understanding neurons
B
1.4
Check out the SoLU paper for more ideas on 1L neurons to find and reverse engineer.
Toy Language Models
Understanding neurons
B
1.8
Are there neurons whose behaviour can be matched by a regex or other code? If so, run it on a ton of text and compare the output.
Toy Language Models
How do larger models differ?
B
1.9
How do 3-layer and 4-layer attention-only models differ from 2L? (For instance, induction heads only appeared with 2L. Can you find something useful that only appears at 3L or higher?)
Toy Language Models
How do larger models differ?
B
1.1
How do 3-layer and 4-layer attention-only models differ from 2L? Look for composition scores - try to identify pairs of heads that compose a lot.
Toy Language Models
How do larger models differ?
B
1.11
How do 3-layer and 4-layer attention-only models differ from 2L? Look for evidence of composition.
Toy Language Models
How do larger models differ?
B
1.12
How do 3-layer and 4-layer attention-only models differ from 2L? Ablate a single head and run the model on a lot of text. Look at the change in performance. Do any heads matter a lot that aren't induction heads?
Toy Language Models
How do larger models differ?
B
1.13
Look for tasks that an n-layer model can't do, but an n+1-layer model can, and look for a circuit that explains this. (Start by running both models on a bunch of text and look for per-token probability differences)
Toy Language Models
How do larger models differ?
B
1.14
How do 1L SoLU/GELU models differ from 1L attention-only?
Toy Language Models
How do larger models differ?
B
1.15
How do 2L SoLU models differ from 1L?
Toy Language Models
How do larger models differ?
B
1.16
How does 1L GELU differ from 1L SoLU?
Toy Language Models
How do larger models differ?
B
1.17
Analyse how a larger model "fixes the bugs" of a smaller model.
Toy Language Models
How do larger models differ?
B
1.18
Does a 1L MLP transformer fix the skip trigram bugs of a 1L Attn Only model? If so, how?
Toy Language Models
How do larger models differ?
B
1.19
Does a 3L attn only model fix bugs in induction heads in a 2L attn-only model? Try looking at split-token induction, where the current token has a preceding space and is one token, but the earlier occurrence has no preceding space and is two tokens. E.g " Claire" vs. "Cl" "aire"
Toy Language Models
How do larger models differ?
B
1.2
Does a 3L attn only model fix bugs in induction heads in a 2L attn-only model? Look at misfiring when the previous token appears multiple times with different following tokens
Toy Language Models
How do larger models differ?
B
1.21
Does a 3L attn only model fix bugs in induction heads in a 2L attn-only model? Look at stopping induction on a token that likely shows the end of a repeated string (e.g, . or ! or ")
Toy Language Models
How do larger models differ?
B
1.22
Does a 2L MLP model fix these bugs (1.19 -1.21) too?
Toy Language Models
Understanding neurons
C
1.5
How far can you get deeply reverse engineering a neuron in a 2+ layer model?
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.