icon picker
A2 DLNN

Last edited 181 days ago by Eddie Coda

Clean setup:

megaphone

1. Cleaning up the Azure VM:

Before copying the correct files, you want to ensure that the VM is cleaned up from the previous installations and the empty A2 directory.
1.1. Remove the Conda environment:
Created a Conda environment named CS561_GPU on the VM. You'd want to remove this environment to ensure a fresh start. Use:
conda env remove -n CS561_GPU
1.2. Remove the A2 directory:


Since the VM contains an empty A2 directory that you created, you'd want to remove it to prevent any conflicts or confusion when copying over the full A2 directory from your local machine.
rm -rf ~/notebooks/A2
megaphone

2. Copying the A2 directory:

Now, you want to transfer the full A2 directory from your local machine to the VM.
2.1. SCP the directory:
Use the scp command to securely copy the entire directory. You'll need to ensure that you're copying the directory and its subdirectories. The -r flag helps with this:
scp -r -P 60553 ~/workplace/cs561/A2/A2 scpdxcs@ml-lab-5f0456f1-7704-4e40-bc2a-912b128b8483.southcentralus.cloudapp.azure.com:~/notebooks/

scp -r -P 60553 ~/workplace/cs561/A3 scpdxcs@ml-lab-5f0456f1-7704-4e40-bc2a-912b128b8483.southcentralus.cloudapp.azure.com:~/notebooks/

This command will copy the A2 directory and its contents to the ~/notebooks/ directory on the VM.

Explanation:

scp: The command for secure copy.
-P PORT: Replace PORT with the actual port number to use for the connection.
/path/to/environment_gpu.yml: Replace this with the actual path where the environment_gpu.yml file is located on your local machine.
scpdxcs@ml-lab-XXXXXXXXXXXXX.southcentralus.cloudapp.azure.com: Replace XXXXXXXXXXXXX with the actual identifier of your VM.
~/notebooks/A2/: The destination path on your VM where the file will be copied
megaphone

3. Setting up the environment on the VM:

3.1. Create the Conda environment:
SSH back into your VM and navigate to where you've placed the environment_gpu.yml file, which should be inside the A2 directory now. Then, create the Conda environment using:
conda env create -f environment_gpu.yml
# MAKE SURE YOU'RE IN THE DIR WHERE THE FILE environment_gpu.yml IS
3.2. Verify the Environment Creation:
After the environment is created, you can verify it by listing all available Conda environments to ensure CS561_GPU is there:
conda env list
3.3 Activate the New Environment:
Activate the new environment to start working with it:
conda activate CS561_GPU
Check if PyTorch with GPU support is working correctly:
python -c "import torch; print(torch.cuda.is_available())"
If everything is set up correctly, this command should print True, indicating that PyTorch can access the GPU. If it prints False, there might be an issue with the CUDA installation or GPU drivers.


Changes:

megaphone

Had to change the submission import in
from
to
from submission import Hypothesis, NMT # from
from submission.nmt_model import Hypothesis, NMT # to

megaphone

Had to change the submission import in
from
to
from submission import read_corpus, pad_sents
from submission.utils import read_corpus, pad_sents


Random Save info ⬇️⬇️⬇️

Output:


megaphone

First try:

validation: iter 2600, dev. ppl 39.505760
hit patience 1
hit #4 trial
load previously best model and decay learning rate to 0.000031
restore parameters of the optimizers
epoch 6, iter 2610, avg. loss 53.91, avg. ppl 7.69 cum. examples 320, speed 698.94 words/sec, time elapsed 1852.73 sec
epoch 6, iter 2620, avg. loss 53.12, avg. ppl 8.22 cum. examples 640, speed 1390.73 words/sec, time elapsed 1858.53 sec
epoch 6, iter 2630, avg. loss 51.83, avg. ppl 7.65 cum. examples 960, speed 1299.50 words/sec, time elapsed 1864.80 sec
epoch 6, iter 2640, avg. loss 51.55, avg. ppl 7.80 cum. examples 1280, speed 1292.73 words/sec, time elapsed 1871.02 sec
epoch 6, iter 2650, avg. loss 55.27, avg. ppl 7.82 cum. examples 1600, speed 1307.18 words/sec, time elapsed 1877.60 sec
epoch 6, iter 2660, avg. loss 51.33, avg. ppl 7.46 cum. examples 1920, speed 1323.40 words/sec, time elapsed 1883.77 sec
epoch 6, iter 2670, avg. loss 52.22, avg. ppl 7.24 cum. examples 2240, speed 1410.89 words/sec, time elapsed 1889.76 sec
epoch 6, iter 2680, avg. loss 51.76, avg. ppl 7.72 cum. examples 2560, speed 1255.42 words/sec, time elapsed 1896.21 sec
epoch 6, iter 2690, avg. loss 54.27, avg. ppl 8.33 cum. examples 2880, speed 1252.85 words/sec, time elapsed 1902.75 sec
epoch 6, iter 2700, avg. loss 49.49, avg. ppl 7.31 cum. examples 3200, speed 1353.48 words/sec, time elapsed 1908.63 sec
epoch 6, iter 2710, avg. loss 50.86, avg. ppl 7.13 cum. examples 3520, speed 1274.38 words/sec, time elapsed 1915.13 sec
epoch 6, iter 2720, avg. loss 50.12, avg. ppl 7.74 cum. examples 3840, speed 1263.30 words/sec, time elapsed 1921.34 sec
epoch 6, iter 2730, avg. loss 51.53, avg. ppl 7.43 cum. examples 4160, speed 1307.03 words/sec, time elapsed 1927.63 sec
epoch 6, iter 2740, avg. loss 54.24, avg. ppl 8.20 cum. examples 4480, speed 1333.54 words/sec, time elapsed 1933.81 sec
epoch 6, iter 2750, avg. loss 55.60, avg. ppl 8.11 cum. examples 4800, speed 1156.44 words/sec, time elapsed 1941.17 sec
epoch 6, iter 2760, avg. loss 51.82, avg. ppl 7.69 cum. examples 5120, speed 1374.12 words/sec, time elapsed 1947.08 sec
epoch 6, iter 2770, avg. loss 56.66, avg. ppl 8.17 cum. examples 5440, speed 1369.81 words/sec, time elapsed 1953.38 sec
epoch 6, iter 2780, avg. loss 55.14, avg. ppl 7.81 cum. examples 5760, speed 1173.97 words/sec, time elapsed 1960.70 sec
epoch 6, iter 2790, avg. loss 54.53, avg. ppl 7.68 cum. examples 6055, speed 1277.30 words/sec, time elapsed 1966.88 sec
epoch 7, iter 2800, avg. loss 51.81, avg. ppl 7.30 cum. examples 6375, speed 1306.13 words/sec, time elapsed 1973.26 sec
epoch 7, iter 2800, cum. loss 52.85, cum. ppl 7.72 cum. examples 6375
begin validation ...
validation: iter 2800, dev. ppl 39.316560
hit patience 1
hit #5 trial
early stop!
(CS561_GPU) scpdxcs@ML-RefVm-80198:~/notebooks/A2/src$ 34.9

George
Alright, let’s cut to the chase. You got a BLEU score of 12.16. That's not something you'd boast about at a conference, is it? It’s low. Too low. In the world of machine translation, a BLEU score of 12.16 is like a car that can't get out of first gear. It moves, but you aren’t going anywhere fast.
Your validation perplexity is 39.316560. In layman’s terms, your model is as confused as a cat at a dog show. It’s guessing, and not in a smart way. Lower this number, and you’ll be onto something.
Early stopping at the 7th epoch? That’s like leaving the party at 7 PM. Did the model really see enough data to learn something valuable, or did it just take a peek and call it quits?
Your code, your architecture, your data - they need a revamp. Rethink your approach, scrutinize your data, and for heaven's sake, optimize your code. Make every epoch count, every data point a learning opportunity, and squeeze out performance like you’re wringing a wet cloth.

Lex
I understand where George is coming from, and I want to build on that by offering some constructive pathways forward.
Your BLEU score isn’t just a number; it’s a reflection of the quality of translations your model is producing. Each point you’re missing is a step away from human-like translations. I’d recommend diving deep into your training data. Is it diverse and comprehensive enough? Are there specific areas or domains where the model is underperforming? Addressing these questions can guide targeted improvements.
The perplexity indicates that the model has room to be more confident in its predictions. It’s essential to ensure your model is learning robust, generalizable patterns. Regularization techniques, including dropout and weight decay, can be finetuned. Look at the learning curves. If there's a significant gap between training and validation performance, you might be dealing with overfitting.
Early stopping is a useful tool, but let’s ensure it’s not cutting the learning process short. Could additional epochs lead to better performance, or is the model architecture itself a limiting factor? Experiment with different architectures and see how they impact the learning dynamics.
Lastly, always consider the end goal: producing high-quality translations. Every metric, every line of code should serve this purpose. Evaluate the translations qualitatively, understand where they’re lacking, and trace those issues back to your model and data. Addressing these specific issues can lead to a more focused and effective improvement strategy.

Task D - Implement the encode function

1. Validate the shapes and values of your tensors at each step.

use print statements to check the shapes and values of tensors. do it for the enc_hiddens and dec_init_state tensors:
# Inside the encode function after getting enc_hiddens and dec_init_state
print(enc_hiddens.shape) # should print: (batch_size, src_len, 2*hidden_size)
print(dec_init_state[0].shape) # should print: (batch_size, hidden_size)
print(dec_init_state[1].shape) # should print: (batch_size, hidden_size)

2. Consider edge cases for short and long sentences

test code with sentences of varying lengths to ensure robustness.
# Testing with a short sentence
short_sentence = [["I", "am"]]
# You should run your encoding function and check if it handles this case without errors and produces the expected outputs

# Testing with a long sentence
long_sentence = [["This"] * 100]
# Run your encoding function again and check the outputs

Task E - Implement the decode function

1. Validate each tensor, each calculation

Again, use print statements or debugging tools to step through your code and check the shapes and values of your tensors.
# Inside the decode function, after getting combined_outputs
print(combined_outputs.shape) # should print: (tgt_len, batch_size, hidden_size)

2. Pay special attention to the attention mechanism

Ensure that the attention scores and distributions are computed and applied correctly.
# Inside the step function, after computing e_t and alpha_t
print(e_t.shape) # should print: (batch_size, src_len)
print(alpha_t.shape) # should print: (batch_size, src_len)

Task F - Implement the step function

1. Dive deep into the LSTM cell outputs

Analyze and understand the LSTM cell outputs.
# Inside the step function, after getting dec_state
print(dec_state[0].shape) # should print: (batch_size, hidden_size)
print(dec_state[1].shape) # should print: (batch_size, hidden_size)

2. Visualize attention scores

plot the attention scores to see how they align over the input sequence.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.