PyTorch - Building the Simple AI Model

PyTorch Level 1 Lab Building a Simple AI Model from Scratch


By the end of this lab, students will learn how to:
Set up a Google Colab environment for PyTorch.
Import necessary libraries.
Prepare and preprocess data.
Define a simple neural network model using PyTorch.
Train and evaluate the model.

Step 1: Set Up Your Environment

First, ensure you have PyTorch installed in your Google Colab environment. PyTorch is already included in Colab, but we will use the following command to ensure we have the latest version.
create your Notebook:

!pip install torch torchvision

Step 2: Import Required Libraries

Next, import the necessary libraries.

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms


The MNIST dataset is a widely recognized and commonly used dataset in the field of machine learning, particularly for image classification tasks [citation:1][citation:2][citation:3][citation:5]. It consists of 70,000 grayscale images of handwritten digits, with dimensions of 28x28 pixels [citation:4]. The dataset is divided into 60,000 images for training and 10,000 images for testing, with each digit (0-9) having 7,000 images (6,000 for training and 1,000 for testing) [citation:3]. The MNIST dataset offers a standard benchmark for various machine learning models, allowing researchers and practitioners to compare and evaluate their algorithms' performance on this specific image classification task [citation:6][citation:7]. Furthermore, the MNIST dataset's simplicity and well-structured format make it an ideal resource for learning and practicing the development, evaluation, and implementation of convolutional deep learning neural networks for image classification tasks [citation:1][citation:5].

References: [citation:1] How to Develop a CNN for MNIST Handwritten Digit Classification (<>) [citation:2] Image Classification in 10 Minutes with MNIST Dataset (<>) [citation:3] mnist · Datasets at Hugging Face (<>) [citation:4] MNIST - Ultralytics YOLOv8 Docs (<>) [citation:5] GitHub - pengfeinie/handwritten-digit-classification: The MNIST ... (<>) [citation:6] MNIST - Ultralytics YOLOv8 Docs (<>) [citation:7] Build Your First Image Classification Model with The MNIST Dataset ... (<>)
Step 3: Prepare and Preprocess Data (waiting for download and extract to complete)
We will use the MNIST dataset, a standard dataset for image classification tasks.
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])

# Download and load the training data
trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
trainloader =, batch_size=64, shuffle=True)

# Download and load the test data
testset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)
testloader =, batch_size=64, shuffle=False)

Step 4: Define the Neural Network Model [done]

Define a simple neural network model with one hidden layer.
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(28 * 28, 128) # Input layer
self.fc2 = nn.Linear(128, 64) # Hidden layer
self.fc3 = nn.Linear(64, 10) # Output layer

def forward(self, x):
x = x.view(-1, 28 * 28) # Flatten the input -- Means normalize: how can we make records look 'the same' ?
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = self.fc3(x)
return x

model = SimpleNN()
Study these method calls on the pytorch documentation site:

Step 5: Define Loss Function and Optimizer (done)

We will use Cross Entropy Loss and the Adam optimizer.
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
The `optim` module in PyTorch provides several families of optimizers, which are algorithms designed to update the parameters of your model based on the gradients computed during backpropagation. Here are some of the most commonly used optimizer families and their references:
1. Stochastic Gradient Descent (SGD) and its variants * `torch.optim.SGD`: Implements the classic SGD algorithm with support for Nesterov momentum. [Reference]( * `torch.optim.SparseAdam`: A variant of Adam optimizer that supports sparse gradients. [Reference](
2. **Adam and its variants** * `torch.optim.Adam`: Implements the Adam algorithm, which is a popular choice for deep learning models. [Reference]( * `torch.optim.AdamW`: A variant of Adam that includes weight decay in the optimization process. [Reference](
3. **Learning Rate Schedulers** * `torch.optim.lr_scheduler`: A collection of learning rate schedulers that can be used to adjust the learning rate during training. [Reference](
4. **Other optimizers** * `torch.optim.RMSprop`: Implements the RMSprop algorithm, which is an optimization algorithm that uses the root mean square of recent gradients to normalize the gradients. [Reference]( * `torch.optim.Rprop`: Implements the Rprop algorithm, which is an optimization algorithm that adapts the learning rate individually for each parameter. [Reference](
In the provided context, the `optim.Adam` optimizer is used, which is a member of the Adam family of optimizers. The learning rate is set to 0.001. The `nn.CrossEntropyLoss()` criterion is used to calculate the loss between the predicted and actual values.

Step 6: Train the Model using the training data.

epochs = 5
for epoch in range(epochs):
running_loss = 0.0
for images, labels in trainloader:
# Zero the parameter gradients

# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)

# Backward pass and optimize

running_loss += loss.item()
print(f'Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(trainloader):.4f}')

Step 7: Evaluate the Model: Evaluate the model using the test data.

correct = 0
total = 0
with torch.no_grad():
for images, labels in testloader:
outputs = model(images)
_, predicted = torch.max(, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()

print(f'Accuracy of the model on the 10000 test images: {100 * correct / total:.2f}%')

Step 8: Save and Load the Model

Save the trained model and demonstrate how to load it.
# Save the model, 'simple_nn.pth')

# Load the model
model = SimpleNN()

Explanation of Steps

Setting Up Environment: We ensure PyTorch is installed in the Google Colab environment.
Importing Libraries: We import necessary libraries such as PyTorch and torchvision.
Preparing Data: We download and load the MNIST dataset, applying necessary transformations.
Defining the Model: We create a simple neural network model using nn.Module.
Defining Loss and Optimizer: We set up the loss function and optimizer for training.
Training the Model: We train the model by running a training loop over the dataset.
Evaluating the Model: We evaluate the model's performance on the test dataset.
Saving and Loading the Model: We save the trained model to a file and load it back.


In this lab, students learned the basics of building and training a simple neural network model using PyTorch. This foundational knowledge prepares them for more advanced topics, such as fine-tuning pre-trained models using transformers, which will be covered in subsequent labs.

What your Output from this lab is: A pytorch tensor file which you can post on a Model Sharing Server.


The PyTorch tensor file (simple_nn.pth) within a Google Colab environment.
This file contains the saved model parameters of a neural network, which is typically in binary format and therefore not human-readable.
To deploy this model to a server, follow these steps:
Save the Model in Colab:
Ensure your model is saved properly using, 'simple_nn.pth').
Download the Model File:
Download the saved model file from Collab to your local machine using:
Copy code
from google.colab import files'simple_nn.pth')
Upload to Model Server:
Upload the simple_nn.pth file to your model server. The exact method will depend on your server's setup (e.g., FTP, SCP, direct upload via web interface).
Load the Model on the Server:
On your model server, you will need to load the model using PyTorch:
import torch from your_model_definition import SimpleNN # replace with your actual model class model = SimpleNN() model.load_state_dict(torch.load('simple_nn.pth')) model.eval() # set the model to evaluation mode
Deploy the Model:
Integrate the model into your application, such as a Flask or FastAPI server, to serve predictions.
If you need help with any specific step, feel free to ask!


Pytorch Lab Level 2: Using a teacher model

Let's proceed with PyTorch Level 2 Lab where we will fine-tune a pre-trained transformer model using the Hugging Face `transformers` library.
This lab will guide you through the process of leveraging a pre-trained model to create an AI language model.

PyTorch Level 2 Lab: Fine-Tuning a Pre-trained Transformer Model

### **Objective:** By the end of this lab, students will learn how to: 1. Set up a Google Colab environment for PyTorch and the Hugging Face `transformers` library. 2. Load a pre-trained transformer model and tokenizer. 3. Prepare and preprocess data. 4. Fine-tune the pre-trained model on a specific dataset. 5. Evaluate the model and generate text.
### **Step 1: Set Up Your Environment**
First, ensure you have the `transformers` library installed in your Google Colab environment.
```python !pip install transformers !pip install torch ```
### **Step 2: Import Required Libraries**
Next, import the necessary libraries.
```python import torch from transformers import GPT2Tokenizer, GPT2LMHeadModel, AdamW, get_linear_schedule_with_warmup from import DataLoader, Dataset, random_split import numpy as np import pandas as pd ```
### **Step 3: Prepare and Preprocess Data**
We'll use a simple dataset of conversational text for this lab. For simplicity, let's define a small dataset directly in the code.
```python # Define a simple dataset data = [ "Hello, how can I help you?", "Hi, I need some assistance.", "Sure, what do you need help with?", "I am looking for information about your services.", "We offer a variety of services, including AI development and consulting.", "Can you tell me more about your AI development services?", "Of course, we specialize in creating custom AI solutions for businesses.", "That's great! How can I get started?", "You can start by scheduling a consultation with one of our experts.", "Thank you, I will do that.", "You're welcome! Have a great day!" ]
# Convert to a pandas DataFrame for easy manipulation df = pd.DataFrame(data, columns=["text"])
# Preprocessing function def preprocess_data(data, tokenizer, max_length=50): inputs = tokenizer(data, return_tensors='pt', max_length=max_length, truncation=True, padding='max_length') inputs["labels"] = inputs.input_ids.detach().clone() return inputs
# Tokenizer tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
# Preprocess the data input_data = [preprocess_data(text, tokenizer) for text in df['text']]
# Convert list of dictionaries to a single dictionary of tensors input_ids =[item['input_ids'] for item in input_data]) attention_mask =[item['attention_mask'] for item in input_data]) labels =[item['labels'] for item in input_data])
# Create a custom dataset class class TextDataset(Dataset): def __init__(self, input_ids, attention_mask, labels): self.input_ids = input_ids self.attention_mask = attention_mask self.labels = labels
def __len__(self): return len(self.input_ids)
def __getitem__(self, idx): return { 'input_ids': self.input_ids[idx], 'attention_mask': self.attention_mask[idx], 'labels': self.labels[idx] }
# Create dataset and dataloader dataset = TextDataset(input_ids, attention_mask, labels) train_size = int(0.8 * len(dataset)) val_size = len(dataset) - train_size train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
train_loader = DataLoader(train_dataset, batch_size=2, shuffle=True) val_loader = DataLoader(val_dataset, batch_size=2) ```

Step 4: Load Pre-trained Model and Define Training Parameters

Load the pre-trained GPT-2 model and set up the optimizer and scheduler.
```python model = GPT2LMHeadModel.from_pretrained("gpt2")'cuda' if torch.cuda.is_available() else 'cpu')
# Optimizer and Scheduler optimizer = AdamW(model.parameters(), lr=5e-5) total_steps = len(train_loader) * 3 # Assuming 3 epochs scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=total_steps) ```

Step 5: Fine-tune the Model

Define the training loop.
```python def train(model, dataloader, optimizer, scheduler): model.train() total_loss = 0 for batch in dataloader: optimizer.zero_grad() input_ids = batch['input_ids'].to('cuda' if torch.cuda.is_available() else 'cpu') attention_mask = batch['attention_mask'].to('cuda' if torch.cuda.is_available() else 'cpu') labels = batch['labels'].to('cuda' if torch.cuda.is_available() else 'cpu')
outputs = model(input_ids, attention_mask=attention_mask, labels=labels) loss = outputs.loss loss.backward() optimizer.step() scheduler.step()
total_loss += loss.item() return total_loss / len(dataloader)
def evaluate(model, dataloader): model.eval() total_loss = 0 with torch.no_grad(): for batch in dataloader: input_ids = batch['input_ids'].to('cuda' if torch.cuda.is_available() else 'cpu') attention_mask = batch['attention_mask'].to('cuda' if torch.cuda.is_available() else 'cpu') labels = batch['labels'].to('cuda' if torch.cuda.is_available() else 'cpu')
outputs = model(input_ids, attention_mask=attention_mask, labels=labels) loss = outputs.loss
total_loss += loss.item() return total_loss / len(dataloader)
epochs = 3 for epoch in range(epochs): train_loss = train(model, train_loader, optimizer, scheduler) val_loss = evaluate(model, val_loader) print(f'Epoch {epoch + 1}, Train Loss: {train_loss:.4f}, Validation Loss: {val_loss:.4f}') ```

Step 6: Generate Text

Create a function to generate text using the fine-tuned model.
```python def generate_text(seed_text, next_words=50): model.eval() input_ids = tokenizer.encode(seed_text, return_tensors='pt').to('cuda' if torch.cuda.is_available() else 'cpu') output = model.generate(input_ids, max_length=next_words, num_return_sequences=1, no_repeat_ngram_size=2) return tokenizer.decode(output[0], skip_special_tokens=True)
seed_text = "Hello" print(generate_text(seed_text, next_words=50)) ```

Explanation of Steps

1. **Setting Up Environment**: Ensure PyTorch and the Hugging Face `transformers` library are installed in the Google Colab environment.
2. **Importing Libraries**: Import necessary libraries such as PyTorch and transformers.
3. **Preparing Data**: Define and preprocess a simple dataset. Convert text data into tensors using the tokenizer.
4. **Loading Pre-trained Model**: Load a pre-trained GPT-2 model and set up the optimizer and learning rate scheduler.
5. **Fine-tuning the Model**: Define and execute the training loop to fine-tune the model on the dataset.
6. **Generating Text**: Define a function to generate text using the fine-tuned model and test it with a seed text.


In this lab, you learned how to fine-tune a pre-trained transformer model using PyTorch.
You prepared data, loaded a pre-trained model, fine-tuned it on a specific dataset, and generated text. This foundational knowledge prepares them for more advanced NLP tasks and real-world applications, providing valuable skills for entry-level AI jobs.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
) instead.