Interpreting Algorithmic Problems

Please see Neel’s post on for a more detailed description of the problems
Interpreting Algorithmic Problems
Search
Beginner problems
3.1
Sorting fixed-length lists. (format - START 4 6 2 9 MID 2 4 6 9)
Beginner problems
3.2
Sorting variable-length lists. (What's the sorting algorithm? What's the longest list you can get do? How does length affect accuracy?)
Beginner problems
3.3
Interpret a 2L MLP (one hidden layer) trained to do modular addition. (Analogous to Neel's grokking work)
Beginner problems
3.4
Interpret a 1L MLP trained to do modular subtraction (Analogous to Neel's grokking work)
Beginner problems
3.5
Taking the minimum or maximum of two ints
Beginner problems
3.6
Permuting lists
Beginner problems
3.7
Calculating sequences with Fibonnaci-style recurrence (predicting next element from the previous two)
Extending Othello-GPT
3.3
Try one of Neel's concrete Othello-GPT projects.
Harder problems
3.8
5-digit addition/subtraction.
Harder problems
3.9
Predicting the output to simple code function. E.g, problems like "a = 1 2 3. a[2] = 4. a -> 1 2 4"
Harder problems
3.1
Graph theory problems like this. Unsure of the correct input format. Try a bunch. See here
Harder problems
3.12
Train models for automata tasks and interpret them. Do your results match the theory?
Harder problems
3.13
In-Context Linear Regression - the transformer gets a sequence (x_1, y_1, x_2, y_2, ...) where y_i = Ax_i + b. A and b are different for each prompt, and need to be learned in-context. (Code here)
Harder problems
3.16
Predict repeated subsequences in randomly generated tokens, and see if you can find and reverse engineer induction heads.
3.18
Build a toy model of Indirect Object Identification - train a tiny attention-only model on an algorithmic task simulating IOI - and reverse-engineer the learned solution. Compare it to the circuit found in GPT-2 Small.
Questions about language models
3.22
Train a 3L attention-only transformer to perform the Indirect Object Identification task. Can it do the task? Does it learn the same circuit found in GPT-2 Small?
Questions about language models
3.23
Redo Neel's modular addition analysis with GELU. Does it change things?
Questions about language models
3.26
In modular addition, look at what different dimensionality reduction techniques do on different weight matrices. Can you identify which weights matter most? Which neurons form clusters for each frequency? Anything from activations?
Extending Othello-GPT
3.32
Neuron Interpretability and Studying Superposition - try to understand the model's MLP neurons, and explore what techniques do and don't work. Try to build our understanding of transformer MLP's in general.
Harder problems
3.14
Problems in In-Context Linear Regression that are in-context learned. See 3.13.
Harder problems
3.15
5 digit (or binary) multiplication
Harder problems
3.17
Choose your own adventure! Find your own algorithmic problem. Leetcode easy is probably a good source.
3.19
Is 3.18 consistent across random seeds, or can other algorithms be learned? Can a 2L model learn this? What happens if you add more MLP's or more layers?
3.2
Reverse-engineer Othello-GPT. Can you reverse-engineer the algorithms it learns, or the features the probes find?
Questions about language models
3.24
How does memorisation work? Try training a one hidden layer MLP to memorise random data, or training a transformer on a fixed set of random strings of tokens.
Questions about language models
3.25
Compare different dimensionality reduction techniques on modular addition or a problem you feel you understand.
Questions about language models
3.27
Is direct logit attribution always useful? Can you find examples where it's highly misleading?
Extending Othello-GPT
3.31
Looking for modular circuits - try to find the circuits used to compute the world model and to use the world model to compute the next move. Try to understand each in isolation and use this to understand how they fit together. See what you can learn about finding modular circuits in general.
Extending Othello-GPT
3.33
Transformer Circuits Laboratory - Explore and test other conjectures about transformer circuits - e.g, can we figure out how the model manages memory in the residual stream?
Deep learning mysteries
3.28
Explore the Lottery Ticket Hypothesis
Deep learning mysteries
3.29
Explore Deep Double Descent
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.