Below is a fully functional example that can be run standalone in a Google Colab notebook.
This code will demonstrate a basic transformer model using PyTorch.
It will include the necessary setup, model definition, and a simple forward pass to show how the model processes input data.
### Basic Transformer Model Example Using PyTorch
#### Step 1: Setup
First, ensure that you have PyTorch installed. In a Google Colab notebook, you can install PyTorch using the following command:
```python
!pip install torch
```
Step 2: Define the Transformer Model
Now, let's define a basic transformer block with PyTorch.
```python
import torch
import torch.nn as nn
class TransformerBlock(nn.Module):
def __init__(self, embed_dim, num_heads, ff_dim):
super(TransformerBlock, self).__init__()
self.att = nn.MultiheadAttention(embed_dim, num_heads)
self.ffn = nn.Sequential(
nn.Linear(embed_dim, ff_dim),
nn.ReLU(),
nn.Linear(ff_dim, embed_dim),
)
self.layernorm1 = nn.LayerNorm(embed_dim)
self.layernorm2 = nn.LayerNorm(embed_dim)
def forward(self, x):
attn_output, _ = self.att(x, x, x)
out1 = self.layernorm1(x + attn_output)
ffn_output = self.ffn(out1)
return self.layernorm2(out1 + ffn_output)
```
Step 3: Instantiate and Test the Model
Now, we will instantiate the transformer block and run a sample input through it.
# Example usage
embed_dim = 128 # Dimension of the embedding
num_heads = 8 # Number of attention heads
ff_dim = 512 # Dimension of the feed-forward network
# Instantiate the transformer block
transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
# Create a sample input (sequence length, batch size, embedding dimension)
sample_input = torch.rand(10, 1, embed_dim) # Sequence length of 10, batch size of 1
# Run the sample input through the transformer block
output = transformer_block(sample_input)
print("Output shape:", output.shape)
```
### Full Colab Notebook Code
Here's the complete code you can copy and paste into a Google Colab notebook:
```python
# Install PyTorch
!pip install torch
# Import necessary libraries
import torch
import torch.nn as nn
# Define the TransformerBlock class
class TransformerBlock(nn.Module):
def __init__(self, embed_dim, num_heads, ff_dim):
super(TransformerBlock, self).__init__()
self.att = nn.MultiheadAttention(embed_dim, num_heads)
self.ffn = nn.Sequential(
nn.Linear(embed_dim, ff_dim),
nn.ReLU(),
nn.Linear(ff_dim, embed_dim),
)
self.layernorm1 = nn.LayerNorm(embed_dim)
self.layernorm2 = nn.LayerNorm(embed_dim)
def forward(self, x):
attn_output, _ = self.att(x, x, x)
out1 = self.layernorm1(x + attn_output)
ffn_output = self.ffn(out1)
return self.layernorm2(out1 + ffn_output)
# Example usage
embed_dim = 128 # Dimension of the embedding
num_heads = 8 # Number of attention heads
ff_dim = 512 # Dimension of the feed-forward network
# Instantiate the transformer block
transformer_block = TransformerBlock(embed_dim, num_heads, ff_dim)
# Create a sample input (sequence length, batch size, embedding dimension)
sample_input = torch.rand(10, 1, embed_dim) # Sequence length of 10, batch size of 1
# Run the sample input through the transformer block
output = transformer_block(sample_input)
# Print the output shape
print("Output shape:", output.shape)
```
This code defines a basic transformer block, creates a sample input tensor, processes it through the transformer block, and prints the output shape. You can run this code directly in a Google Colab notebook to see the transformer block in action.