Lab Workbook: Building a Transformer-Based AI Language Model (Project Template)

We will be working in Google Collab Notebook:

Finished working version of our AI Question Answering Chat Bot:

import torch
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

# Model and tokenizer initialization for a question answering task using RoBERTa
model_name = "deepset/roberta-base-squad2" # Example of a RoBERTa model fine-tuned on SQuAD v2
student_model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Device configuration and evaluation mode
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Context text for the model to reference
text = """
The early vacuum tube computers, also known as first-generation computers, used a variety of memory technologies before settling on magnetic-core memory. The Atanasoff-Berry computer of 1942 stored numerical values as binary numbers in a revolving mechanical drum, with a special circuit to refresh this "dynamic" memory on every revolution. The war-time ENIAC, developed in 1946, could store 20 numbers, but the vacuum-tube registers used were too expensive to build to store more than a few numbers. A stored-program computer was out of reach until an economical form of memory could be developed.

The core memory used on the 1103 had an access time of 10 microseconds. The 1950s saw the evolution of the electronic computer from a research project to a commercial product, with common designs and multiple copies made, thereby starting a major new industry. The early commercial machines used vacuum tubes and a variety of memory technologies, converging on magnetic core by the end of the decade.

Magnetic-Core Memory

Magnetic-core memory was a significant development in computer memory technology. It was first used in the Whirlwind computer in 1949 and later in the IBM 705, a vacuum tube-based computer delivered in 1955. In 1976, 95% of all computer main memories consisted of ferrite cores, with 20-30 billion of them being produced yearly worldwide.

Transition from Vacuum Tubes to Transistors

Vacuum tubes were vital components of early computers, with the ENIAC containing an impressive 17,500 vacuum tubes. However, in 1954, Bell Labs built the first computer that didn't use vacuum tubes, the transistorized "TRADIC" computer for the U.S. Air Force. This marked the beginning of the transition from vacuum tubes to transistors in computer technology.

Role of Vacuum Tubes

Vacuum tubes were extensively used in early, first-generation computers for logical calculations and as a way to store computer memory. They were used in many electronic devices, including radios, telephone networks, sound recording/amplification & reproduction, radar, televisions, and computers.

Replacement of Magnetic Drums and Vacuum Tube Storage

Core memory, which could only be made by hand while looking through a microscope, replaced magnetic drums and volatile vacuum tube storage in the 1960s when it became cheap enough to manufacture.

In summary, early vacuum tube computers used a variety of memory technologies before settling on magnetic-core memory, which played a crucial role in the development of computer memory technology. The transition from vacuum tubes to transistors marked a significant shift in computer technology.

# Function to answer questions based on the provided context
def answer_question(question, context):
# Encoding the question and context, maxing out the context for the model's limits
model_max_length = tokenizer.model_max_length
inputs = tokenizer.encode_plus(
question, context,
max_length=model_max_length # Using the max length that the model can handle
inputs = {key: for key, val in inputs.items()}

# Performing inference
with torch.no_grad():
outputs = student_model(**inputs)
# Extracting the scores for the start and end of the answer
answer_start_scores, answer_end_scores = outputs.start_logits, outputs.end_logits
# Identifying the tokens with the highest start and end scores
answer_start = torch.argmax(answer_start_scores)
answer_end = torch.argmax(answer_end_scores) + 1
# Ensuring valid answer span
if answer_end <= answer_start:
return "The answer span is invalid."
# Decoding the tokens into a string
answer = tokenizer.convert_tokens_to_string(
return answer

# Interactive loop for user to ask questions
while True:
user_input = input("Ask a question (type 'exit' to quit): ")
if user_input.lower() == 'exit':
# Getting the answer to the question from the model
response = answer_question(user_input, text)
print("Answer:", response)
except Exception as e:
# If an error occurs, print the error message
print("Error:", e)

This is the template of what your project will look like:

Transformers in the Context of LLM AI Language Models like ChatGPT

The transformer is a type of deep learning model architecture that has been widely used in natural language processing tasks.
It is the foundational architecture for many large language models, including ChatGPT.

Key Features of Transformers:

Self-Attention Mechanism: The transformer architecture employs a self-attention mechanism that allows it to weigh the significance of different words in a sentence when predicting the next word or token. This mechanism helps the model capture long-range dependencies in the input text efficiently.
Parallelization: Transformers can process input data in parallel, which makes them highly efficient for training on large datasets.

Positional Encoding: To account for the sequential nature of language, transformers use positional encoding to provide information about the order of words in the input sequence.
Stacked Layers: Transformers typically consist of multiple layers, each containing a self-attention mechanism and feedforward neural network layers.

Pre-training in the Context of Large Language Models (LLM)

Pre-training refers to the process of training a model on a large corpus of text data before fine-tuning it for specific tasks. Large language models, such as ChatGPT, are pre-trained on diverse and extensive text corpora to learn the nuances of language and the structure of natural language data.

Significance of Pre-training:

Learning Language Representations: During pre-training, the model learns to represent and understand language by capturing patterns, context, and relationships within the input text. The Pre Training of the Transformer is where the Embedding is created by building the PYTORCH Tensor file.
Enabling Transfer Learning {Enabling Knowledge Distillation from a TEACHER MODEL}: Pre-training enables the model to acquire a broad understanding of language, which can be leveraged for various downstream tasks with minimal task-specific training data through transfer learning.
Enhanced Performance: Pre-training on extensive data allows the model to develop a rich and nuanced understanding of language, leading to improved performance on a wide range of language-related tasks. {Language translation. Speaking in a variety of voices and personalities}.


In conclusion, the transformer architecture, as utilized in large language models like ChatGPT, employs advanced mechanisms such as self-attention and possesses the ability to process input data in parallel.
Pre-training involves training the model on vast amounts of text data, enabling it to learn comprehensive language representations and deliver superior performance on diverse language tasks.
Transformers make with NLTK Classes such as: PYTORCH and TENSOR FLOW.

Implementing Transformers with PyTorch and TensorFlow

Using PyTorch:

PyTorch Transformer Module: PyTorch provides a nn.Transformer module that implements the transformer architecture. This module includes components like encoder and decoder layers, attention mechanisms, and feedforward neural networks.
Tokenization: PyTorch provides tools like torchtext or tokenizers to help tokenize and process text data for input into transformers.
Training: You can train a transformer model using PyTorch's torch.optim for optimization and torch.nn for defining the neural network components.
Pre-trained Models: PyTorch Hub offers pre-trained transformer models like BERT, GPT, and Transformer-XL that can be easily loaded and fine-tuned for specific tasks.

Using TensorFlow:

TensorFlow Transformer: TensorFlow provides the tf.keras.layers.MultiHeadAttention layer and the tf.keras.layers.Transformer module to build transformer architectures.
Tokenization: TensorFlow's Tokenizer API helps with tokenizing text data for transformers.
Training: You can train a transformer model in TensorFlow using the tf.keras.optimizers for optimization and custom Keras layers to build the transformer model.
Model Deployment: TensorFlow Serving can be used to deploy trained transformer models for production use.


Advantages of PyTorch:
PyTorch provides dynamic computational graphs that are easier for debugging and experimentation.
PyTorch is often preferred by researchers due to its flexibility and ease in prototyping.
Advantages of TensorFlow:
TensorFlow offers a high-level API with TensorFlow Keras for building neural networks, including transformers.
TensorFlow's strong support for deployment and production readiness makes it popular in industrial applications.


Both PyTorch and TensorFlow offer robust tools and libraries for implementing transformers. The choice between them often comes down to personal preference, project requirements, and familiarity with the respective frameworks. Researchers may lean towards PyTorch for its flexibility, while developers might prefer TensorFlow for its deployment capabilities. Regardless of the choice, both frameworks enable efficient implementation of transformer models for various natural language processing tasks.


In this lab, you will learn how to build a transformer-based AI language model utilizing TensorFlow and PyTorch.
Teacher Student Knowledge Distillation.
By the end of this Lab, you will have developed a custom language model trained on text data fetched from an API and refined through knowledge distillation techniques, using Google Collab as your development environment.


Basic understanding of Python programming.
Familiarity with Google Collab.
Basic knowledge of machine learning concepts, especially neural networks. PyTorch and TensorFlow.

Tools and Libraries Needed:

Google Collab (Free version)
Transformers library by Hugging Face
Requests library for API calls

Part 1: Setting Up Your Environment

Open Google Collab:
Visit and start a new Python 3 notebook.
Install Required Libraries:
Run the following commands to install necessary libraries:

!pip install tensorflow
!pip install torch
!pip install transformers
!pip install requests

Part 2: Fetching Data

Import Libraries:
import requests
from transformers import GPT2Tokenizer, GPT2Model
import torch.nn as nn
The command pip install requests is used in Python to install the requests library, which is a popular and versatile HTTP library used for making network requests to web servers. It's commonly used to interact with APIs or fetch data from the internet directly from a Python application.
In the context of setting up a lab environment to build a knowledge transfer AI model, including this command would be crucial if your AI model needs to retrieve data from online sources or interact with APIs for data collection or integration purposes. The requests library would enable your lab code to access external data seamlessly, which could be essential for training the model or fetching real-time data for analysis and inference.

These lines involve importing specific Python libraries and modules that are crucial for developing applications involving natural language processing, specifically using models like GPT-2.

import requests:

Purpose: This imports the requests library, which is used to make HTTP requests to web servers. It is commonly used to access APIs or other web resources.
Teaching Context: In an AI development lab, you would teach students how this can be used to fetch data from online resources, which might be needed for tasks like updating model parameters in real-time, retrieving training data, or interacting with APIs that provide useful functionalities for AI applications.
from transformers import GPT2Tokenizer, GPT2Model:

Purpose: This line imports GPT2Tokenizer and GPT2Model from the transformers library developed by Hugging Face.
The tokenizer is used to convert text into a format that the model can understand (tokens), and the model is a pre-trained GPT-2 model used for generating text or understanding language.
Teaching Context: This is critical for introducing students to the concept of tokenization and model usage. You would explain how the tokenizer works to encode input text into tokens that are numerical representations understood by neural networks. The model can then be used for various tasks like text generation, sentiment analysis, etc., depending on how it's configured and what it has been trained on.
import torch.nn as nn:

Purpose: This imports the neural network module (nn) from PyTorch, which is a deep learning library. This module contains classes and methods necessary for building neural network layers like convolutions, activations, etc.
Teaching Context: When teaching about torch.nn, you’d focus on how it allows for the creation and training of neural network models from scratch or for modifying pre-existing models. You could discuss different types of layers and their functions, how to stack layers to create complex architectures, and the importance of activation functions and other network components in learning patterns from data.
In a teaching environment, especially one focused on building knowledge transfer AI models, these lines set the foundation for students to learn about accessing, manipulating, and understanding large volumes of data through advanced AI models and tools. You’d also delve into practical exercises where students use these libraries to fetch data, preprocess it with tokenizers, and then run predictions or analyses using neural network models built with PyTorch.

Fetch Text Data:
Accessing Weather Data via a No-Login API
You can access weather data using the OpenWeatherMap API, which provides various weather-related data without requiring a login.
To retrieve weather data for a specific location, you can make a GET request to the following endpoint, replacing <YOUR_API_KEY> with your actual API key:{city name}&appid=<YOUR_API_KEY>

This will return weather data for the specified city in a JSON format, including information such as temperature, humidity, wind speed, and more. You can then process this data in your Python program.

Remember to sign up on the OpenWeatherMap website to obtain your API key. Once you have the key, you can use it to access weather data without the need for any login or authentication.

Feel free to experiment with this API and let me know if you have any more questions or need further assistance!
Use the requests library to fetch data from an API that provides textual content.
For this example, we will use a placeholder API.
response = requests.get("") text = response.text # assuming the API returns text directly
import requests

# URL for fetching posts from JSONPlaceholder
url = ""

# Sending a GET request to the API
response = requests.get(url)

# Extracting the text content of the response
text = response.text
# Optionally, convert this text to JSON format if you want to work with data more conveniently
data = response.json()
# Example: print the first post's title
import requests

# URL for fetching a random quote from Quotable Quotes API
url = ""

# Sending a GET request to the Quotable Quotes API
response = requests.get(url)

# Extracting the JSON content of the response
data = response.json()

# Print the quote

Part 3: Preprocessing Data

Use the GPT-2 tokenizer to encode the text data.
tokenizer = GPT2Tokenizer.from_pretrained('gpt2') encoded_input = tokenizer(text, return_tensors='pt')
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
) instead.