Analysing Training Dynamics

Please see Neel’s post on for a more detailed description of the problems
Training dynamics
14
Category
Difficulty
Existing Work
Currently working
Help Wanted?
Search
Analysing Training Dynamics
Understanding fine-tuning
A
5.16
How does model performance change on the original training distribution when finetuning?
Analysing Training Dynamics
Understanding training dynamics in language models
A
5.25
Look at attention heads on various texts and see if any have recognisable attention patterns, then analyse them over training.
Analysing Training Dynamics
Finding phase transitions
A
5.26
Look for phase transitions in the Indirect Object Identification task. (Note: This might not have a phase change)
Analysing Training Dynamics
Studying path dependence
A
5.33
How much do the Stanford CRFM models have similar outputs on a given text?
Analysing Training Dynamics
Studying path dependence
A
5.35
Look for Indirect Object Identification capability in other models of approximately the same size.
Analysing Training Dynamics
Studying path dependence
A
5.38
Can you find some problem where you understand the circuits and Git Re-Basin does work?
Analysing Training Dynamics
Algorithmic tasks - understanding grokking
B
5.1
Understanding why 5 digit addition has a phase change per digit (so 6 total?!)
Analysing Training Dynamics
Algorithmic tasks - understanding grokking
B
5.3
Look at the PCA of logits on the full dataset, or the PCA of a stack of flattened weights. If you plot a scatter plot of the first 2 components, the different phases of training are clearly visible. What's up with this?
Analysing Training Dynamics
Algorithmic tasks - understanding grokking
B
5.6
What happens if we include in the loss one of the progress measures in Neel's grokking post? Can we accelerate or stop grokking?
Analysing Training Dynamics
Algorithmic tasks - understanding grokking
B
5.7
Adam Jermyn provides an analytical argument and some toy models for why phase transition should be an inherent part of (some of) how models learn. Can you find evidence of this in more complex models?
Analysing Training Dynamics
Algorithmic tasks - understanding grokking
B
5.8
Build on and refine Adam Jermyn's arguments and toy models - think about how they deviate from a real transformer, and build more faithful models.
Analysing Training Dynamics
Algorithmic tasks - lottery tickets
B
5.9
For a toy model trained to form induction heads, is there a lottery-ticket style thing going on? Can you disrupt induction head formation by messing with the initialisation?
Analysing Training Dynamics
Algorithmic tasks - lottery tickets
B
5.11
If we knock out the parameters that form important circuits at the end of training on some toy task, but knock them out at the start of training, how much does that delay/stop generalisation?
Analysing Training Dynamics
Algorithmic tasks - lottery tickets
B
5.12
Analysing how pairs of heads in an induction circuit compose over time - Can you find progress measures which predict these?
Analysing Training Dynamics
Algorithmic tasks - lottery tickets
B
5.13
Analysing how pairs of heads in an induction circuit compose over time - Can we predict which heads will learn to compose first?
Analysing Training Dynamics
Algorithmic tasks - lottery tickets
B
5.14
Analysing how pairs of heads in an induction circuit compose over time -Does the composition develop as a phase transition?
Analysing Training Dynamics
Understanding fine-tuning
B
5.17
How is the model different on fine-tuned text? Look at examples where the model does much better after fine-tuning, and some normal text.
Analysing Training Dynamics
Understanding fine-tuning
B
5.18
Try activation patching between the old and fine-tuned model and see how hard recovering performance is.
Analysing Training Dynamics
Understanding fine-tuning
B
5.19
Look at max activating text for various neurons in the original models. How has it changed post fine-tuning?
Analysing Training Dynamics
Understanding fine-tuning
B
5.2
Explore further and see what's going on with fine-tuning mechanistically.
Analysing Training Dynamics
Understanding training dynamics in language models
B
5.22
Can you replicate the induction head phase transition results in the various checkpointed models in TransformerLens? (If code works for attn-only-2l it should work for them all)
Analysing Training Dynamics
Understanding training dynamics in language models
B
5.23
Look at the neurons in TransformerLens SoLU models during training. Do they tend to form as a phase transition?
Analysing Training Dynamics
Finding phase transitions
B
5.27
Try digging into the specific heads that act on IOI and look for phase transitions. Use direct logit attribution for the name movers.
Analysing Training Dynamics
Finding phase transitions
B
5.28
Study the attention patterns of each category of heads in IOI for phase transitions.
Analysing Training Dynamics
Finding phase transitions
B
5.29
Look for phase transitions in simple IOI-style algorithmic tasks, like few-shot learning, addition, sorting words alphabetically...
Analysing Training Dynamics
Finding phase transitions
B
5.3
Look for phase transitions in soft induction heads like translation.
Analysing Training Dynamics
Studying path dependence
B
5.34
How much do the Stanford CRFM models differ with algorithmic tasks like Indirect Object Identification?
Analysing Training Dynamics
Studying path dependence
B
5.36
When model scale varies (e.g, GPT-2 small vs. medium) is there anything the smaller model can do that the larger one can't do? (Look at difference in per token log prob)
Analysing Training Dynamics
Studying path dependence
B
5.37
Try applying the Git Re-Basin techniques to a 2L MLP trained for modular addition. Does this work? If you use Neel's grokking work to analyse the circuits involved, how does the re-basin technique map onto the circuits?
Analysing Training Dynamics
Algorithmic tasks - understanding grokking
C
5.2
Why do 5-digit addition phase changes happen in that order?
Analysing Training Dynamics
Algorithmic tasks - understanding grokking
C
5.4
Can we predict when grokking will happen? Bonus: Without using any future information?
Analysing Training Dynamics
Algorithmic tasks - understanding grokking
C
5.5
Understanding why the model chooses specific frequencies (and why it switches mid-training sometimes!)
Analysing Training Dynamics
Algorithmic tasks - lottery tickets
C
5.1
All Neel's toy models (attn-only, gelu, solu) were trained with the same data shuffle and weight initialisation. Many induction heads aren't shared, but L2H3 in 3L and L1H6 in 2L always are. What's up with that?
Analysing Training Dynamics
Understanding fine-tuning
C
5.15
Build a toy model of fine-tuning (train on task 1, fine-tune on task 2). What is going on internally? Any interesting motifs?
Analysing Training Dynamics
Understanding fine-tuning
C
5.21
Can you find any phase transitions in the fine-tuning checkpoints?
Analysing Training Dynamics
Understanding training dynamics in language models
C
5.24
Use the per-token loss analysis technique from the induction heads paper to look for more phase changes.
Analysing Training Dynamics
Finding phase transitions
C
5.31
Look for phase transitions in benchmark performance or specific questions from a benchmark.
Analysing Training Dynamics
Finding phase transitions
D
5.32
Hypothesis: Scaling laws happen because models experience a ton of tiny phase changes which average out to a smooth curve due to the law of large numbers. Can you find evidence for or against that?
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.