Interpreting Algorithmic Problems

Please see Neel’s post on for a more detailed description of the problems
Interpreting Algorithmic Problems
Search
Interpreting Algorithmic Problems
Beginner problems
A
3.1
Sorting fixed-length lists. (format - START 4 6 2 9 MID 2 4 6 9)
Interpreting Algorithmic Problems
Beginner problems
A
3.2
Sorting variable-length lists. (What's the sorting algorithm? What's the longest list you can get do? How does length affect accuracy?)
Interpreting Algorithmic Problems
Beginner problems
A
3.3
Interpret a 2L MLP (one hidden layer) trained to do modular addition. (Analogous to Neel's grokking work)
Interpreting Algorithmic Problems
Beginner problems
A
3.4
Interpret a 1L MLP trained to do modular subtraction (Analogous to Neel's grokking work)
Interpreting Algorithmic Problems
Beginner problems
A
3.5
Taking the minimum or maximum of two ints
Interpreting Algorithmic Problems
Beginner problems
A
3.6
Permuting lists
Interpreting Algorithmic Problems
Beginner problems
A
3.7
Calculating sequences with Fibonnaci-style recurrence (predicting next element from the previous two)
Interpreting Algorithmic Problems
Extending Othello-GPT
A
3.3
Try one of Neel's concrete Othello-GPT projects.
Interpreting Algorithmic Problems
Harder problems
B
3.8
5-digit addition/subtraction.
Interpreting Algorithmic Problems
Harder problems
B
3.9
Predicting the output to simple code function. E.g, problems like "a = 1 2 3. a[2] = 4. a -> 1 2 4"
Interpreting Algorithmic Problems
Harder problems
B
3.1
Graph theory problems like this. Unsure of the correct input format. Try a bunch. See here
Interpreting Algorithmic Problems
Harder problems
B
3.12
Train models for automata tasks and interpret them. Do your results match the theory?
Interpreting Algorithmic Problems
Harder problems
B
3.13
In-Context Linear Regression - the transformer gets a sequence (x_1, y_1, x_2, y_2, ...) where y_i = Ax_i + b. A and b are different for each prompt, and need to be learned in-context. (Code here)
Interpreting Algorithmic Problems
Harder problems
B
3.16
Predict repeated subsequences in randomly generated tokens, and see if you can find and reverse engineer induction heads.
Interpreting Algorithmic Problems
B
3.18
Build a toy model of Indirect Object Identification - train a tiny attention-only model on an algorithmic task simulating IOI - and reverse-engineer the learned solution. Compare it to the circuit found in GPT-2 Small.
Interpreting Algorithmic Problems
Questions about language models
B
3.22
Train a 3L attention-only transformer to perform the Indirect Object Identification task. Can it do the task? Does it learn the same circuit found in GPT-2 Small?
Interpreting Algorithmic Problems
Questions about language models
B
3.23
Redo Neel's modular addition analysis with GELU. Does it change things?
Interpreting Algorithmic Problems
Questions about language models
B
3.26
In modular addition, look at what different dimensionality reduction techniques do on different weight matrices. Can you identify which weights matter most? Which neurons form clusters for each frequency? Anything from activations?
Interpreting Algorithmic Problems
Extending Othello-GPT
B
3.32
Neuron Interpretability and Studying Superposition - try to understand the model's MLP neurons, and explore what techniques do and don't work. Try to build our understanding of transformer MLP's in general.
Interpreting Algorithmic Problems
Harder problems
C
3.14
Problems in In-Context Linear Regression that are in-context learned. See 3.13.
Interpreting Algorithmic Problems
Harder problems
C
3.15
5 digit (or binary) multiplication
Interpreting Algorithmic Problems
Harder problems
C
3.17
Choose your own adventure! Find your own algorithmic problem. Leetcode easy is probably a good source.
Interpreting Algorithmic Problems
C
3.19
Is 3.18 consistent across random seeds, or can other algorithms be learned? Can a 2L model learn this? What happens if you add more MLP's or more layers?
Interpreting Algorithmic Problems
C
3.2
Reverse-engineer Othello-GPT. Can you reverse-engineer the algorithms it learns, or the features the probes find?
Interpreting Algorithmic Problems
Questions about language models
C
3.24
How does memorisation work? Try training a one hidden layer MLP to memorise random data, or training a transformer on a fixed set of random strings of tokens.
Interpreting Algorithmic Problems
Questions about language models
C
3.25
Compare different dimensionality reduction techniques on modular addition or a problem you feel you understand.
Interpreting Algorithmic Problems
Questions about language models
C
3.27
Is direct logit attribution always useful? Can you find examples where it's highly misleading?
Interpreting Algorithmic Problems
Extending Othello-GPT
C
3.31
Looking for modular circuits - try to find the circuits used to compute the world model and to use the world model to compute the next move. Try to understand each in isolation and use this to understand how they fit together. See what you can learn about finding modular circuits in general.
Interpreting Algorithmic Problems
Extending Othello-GPT
C
3.33
Transformer Circuits Laboratory - Explore and test other conjectures about transformer circuits - e.g, can we figure out how the model manages memory in the residual stream?
Interpreting Algorithmic Problems
Deep learning mysteries
D
3.28
Explore the Lottery Ticket Hypothesis
Interpreting Algorithmic Problems
Deep learning mysteries
D
3.29
Explore Deep Double Descent
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.