icon picker
A3

Last edited 158 days ago by System Writer

megaphone

Note. Here are some things to keep in mind as you plan your time for this assignment.

The total amount of pytorch code to write, and code complexity, of this assignment is lower than Assignment 4. However, you’re also given less guidance or scaffolding in how to write the code.
This assignment involves a pretraining step that takes approximately 2 hours to perform on Azure, and you’ll have to do it twice.
You’ll train a Transformer model to attempt to answer simple questions of the form “Where was person [x] born?” – without providing any input text from which to draw the answer. You’ll find that models are able to learn some facts about where people were born through pretraining, and access that information during fine-tuning to answer the questions. Then, you’ll take a harder look at the system you built, and reason about the implications and concerns about relying on such implicit pretrained knowledge.
You’ll need around 5 hours for training.

Pretrained Transformer models and knowledge access

[0 points (Coding)] Review the minGPT demo code.
In the src/submission/mingpt-demo/ folder, there is a Jupyter notebook (play char.ipynb) that trains and samples from a Transformer language model. Take a look at it locally on your computer and you might need to install Jupyter notebootk pip install jupyter to get somewhat familiar with the code how it defines and trains models. You don’t need to run the train locally, because training will take long time on CPU only local environment. Some of the code you are writing below will be inspired by what you see in this notebook.
Note that you do not have to write any code or submit written answers for this part.
[0 points (Coding)] Read through NameDataset in src/submission/dataset.py
(c) [4 points (Coding)] Implement finetuning (without pretraining).

Take a look at src/submission/helper.py

info

Eventually you will pretrain and finetune a model. For now, focus on the finetune without pretraining
Modify [part c]
initialize
image.png
finetune
image.png
train
image.png
Check if the reading_params_path is provided. If it is, load the pretrained model parameters into the model using torch.load() Since we are not pretraining in this case, this step can be skipped.
Initialize the NameDataset for finetuning using the finetune_corpus_path.
Set the hyperparameters for the TrainerConfig.
Initialize the Trainer with the model, finetune dataset, and trainer configuration.
Train the model using the Trainer.

trainer_obj.train() in def finetune():
This line is responsible for training the model on the finetuning dataset. The Trainer object is initialized with the model, the finetuning dataset, and the training configuration. The train() method of the Trainer object is then called to start the training process.
trainer_obj.train() in def train():
This line is also responsible for training the model. However, this function is more general and can be used for both pretraining and finetuning. The Trainer object is passed as an argument to this function, and the train() method is called to start the training process.
The train() method in both functions is essentially doing the same thing - training the model. The difference lies in the context in which they are used. In finetune(), it's specifically for finetuning the model, while in train(), it's a more general function that can be used for both pretraining and finetuning.

NO - Pretrain RESULTS

After running ./run.sh vanilla_finetune_without_pretrain
image.png
After running ./run.sh vanilla_eval_dev_without_pretrain
image.png
image.png
After running ./run.sh vanilla_eval_test_without_pretrain
image.png
image.png

Pretraining

In the file src/submission/dataset.py, implement the getitem() function for the dataset class CharCorruptionDataset
Follow the instructions provided in the comments in dataset.py Span corruption is explored in the T5 paper [2]. It randomly selects spans of text in a document and replaces them with unique tokens (noising). Models take this noised text, and are required to output a pattern of each unique sentinel followed by the tokens that were replaced by that sentinel in the input. In this question, you’ll implement a simplification that only masks out a single sequence of characters.
This question will be graded via autograder based on your whether span corruption function implements some basic properties of our spec. We’ll instandtiate the CharCorruptionDataset with our own data, and draw examples from it.
Pretrain, finetune, and make predictions
Now fill in the pretrain portion of src/submission/helper.py, which will pretrain a model on the span corruption task.
Additionally, modify your finetune portion to handle fine-tuning in the case with pretraining.
Define the hyperparameters for pretraining as per the specifications given in the comments.
Initialize a TrainerConfig object with these hyperparameters.
Initialize a Trainer object with the model, pretrain dataset, None for the validation dataset, and the TrainerConfig object.
Return the TrainerConfig and Trainer objects.
In particular, if a path to a pretrained model is provided in the bash command, load this model before finetuning it on the birth-place prediction task.
Pretrain your model on wiki.txt (which should take approximately two hours), finetune it on NameDataset and evaluate it. Specifically, you should be able to run the following four commands:

Synthesizer

We’ll now go to changing the Transformer architecture itself – specifically, the self-attention module. While we’ve been using a self-attention scoring function based on dot products, this involves a rather intensive computation that’s quadratic in the sequence length.
This is because the dot product between 𝓁 ² pairs of word vectors is computed in each computation. Synthesized attention is a very recent alternative that has potential benefits by removing this dot product (and quadratic computation) entirely.
It’s a promising idea, and one way for us to ask,
What’s important/right about the Transformer architecture, and where can we improve/prune aspects of it?
In attentton.py implement the forward method of SynthesizerAttention, which implements a variant of the Synthesizer proposed in the cited paper.
image.png
In the rest of the code in the src/submission/helper.py folder:
modify your model to support using either CausalSelfAttention or SynthesizerAttention.
Add the ability to switch between these attention variants depending on whether “vanilla” (for causal self-attention) or “synthesizer” (for the synthesizer variant) is selected in the command line arguments.
(see the section marked [part g] in src/submission/helper.py).
Below are bash commands that your code should support in order to pretrain the model, finetune it, and make predictions on the dev and test sets.
Note that the pretraining process will take approximately 2 hours.


Deliverables

Within the src/submission directory:
src/submission/ init .py
src/submission/attention.py
src/submission/dataset.py
src/submission/helper.py
src/submission/model.py
src/submission/trainer.py
src/submission/utils.py
src/submission/vanilla.model.params
src/submission/vanilla.nopretrain.dev.predictions
src/submission/vanilla.nopretrain.test.predictions
src/submission/vanilla.pretrain.params
src/submission/vanilla.finetune.params
src/submission/vanilla.pretrain.dev.predictions
src/submission/vanilla.pretrain.test.predictions
src/submission/synthesizer.pretrain.params
src/submission/synthesizer.finetune.params
src/submission/synthesizer.pretrain.dev.predictions
src/submission/synthesizer.pretrain.test.predictions

Coding

image.png
image.png


Xthequickbrownfox321

Tmux

Alright, let's bring out G and L for a more nuanced and precise approach.
G's Straight Talk:
First off, using & to kill a window in tmux is like using a sledgehammer to crack a nut if you're not 100% sure you're done with that window. It's a permanent move, and there's no Ctrl+Z undo magic here. You press &, and bam, that window and everything in it is gone.
Your watcher window is like your second set of eyes. If you're closing it, be sure it's not watching something critical. You're in the middle of training a model in yocoda, so why shut down your monitoring setup? Just switch back to yocoda with Ctrl+b then w, select it, and hit Enter. Leave watcher in peace - it's not eating your lunch.
L's Constructive Path Forward:
Let's focus on ensuring your workflow is seamless. If you decide to close the watcher window, do so cautiously:
Switch to the watcher Window:
Navigate using Ctrl+b + w, highlight watcher, and press Enter.
Close with Care:
If you're certain you're done with watcher, close it by typing exit or pressing Ctrl+d. The & command is final - it's only suitable if you're sure you don't need that window anymore.
Return to yocoda Gracefully:
Switch back to yocoda by again using Ctrl+b + w, select yocoda, and press Enter.
image.png
You can detach from your tmux session by pressing Ctrl+B then D. Tmux operates using a series of keybindings (keyboard shortcuts) triggered by pressing the "prefix" combination. By default, the prefix is Ctrl+B. After that, press D to detach from the current session.
You're no longer attached to the session, but your long-running command executes safely in the background. You can list active tmux sessions with tmux ls:
You can disconnect your SSH connection at this point, and the command will continue to run. When you're ready, reconnect to the server and reattach to the existing tmux session to resume where you left off:
Tmux provides several keybindings to execute commands quickly in a tmux session. Here are some of the most useful ones.
First, create a new tmux session if you're not already in one. You can name your session by passing the parameter -s {name} to the tmux new command when creating a new session:
Ctrl+B D — Detach from the current session.
Ctrl+B % — Split the window into two panes horizontally.
Ctrl+B " — Split the window into two panes vertically.
Ctrl+B Arrow Key (Left, Right, Up, Down) — Move between panes.
Ctrl+B X — Close pane.
Ctrl+B C — Create a new window.
Ctrl+B N or P — Move to the next or previous window.
Ctrl+B 0 (1,2...) — Move to a specific window by number.
Ctrl+B : — Enter the command line to type commands. Tab completion is available.
Ctrl+B ? — View all keybindings. Press Q to exit.
Ctrl+B W — Open a panel to navigate across windows in multiple sessions.
For additional keybindings, consult the tmux man pages.


Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.