Explore

Gen AI Foundational Models for NLP & Language Training

Language Understanding with Neural Networks

Converting Words to Features

One-hot encoding converts categorical data into feature vectors.

The bag of words representation portrays a document as the aggregate or average of one-hot encoded vectors.

When you feed a bag-of-words vector to a neural network’s hidden layer, the output is the sum of the embeddings.

The Embedding and EmbeddingBag classes implement embedding and embedding bags in PyTorch.

⁠

Document Categorization Prediction with Torchtext

Document Classifier seamlessly categorizes articles by analyzing the text content.

Neural Network is a mathematical function consisting of a sequence of matrix multiplications with a variety of other functions.

Argmax function identifies the index of the highest logit value, corresponding to the most likely class.

Hyperparameters are externally set configurations of a neural network.

Prediction function:

Works on real text that stars by taking in tokenized text.

Processes the text through the pipeline, and the model predicts the category.

⁠

Document Categorization Training with Torchtext

Neural networks functions via matrix and vector operations called learnable parameters

In neural network training:

Learnable parameters are fine-tuned to enhance model performance

Process is steered by the loss function, which serves as a measure of accuracy.

Prediction function

Works on real text that starts by taking in tokenized text.

Processes the text through the pipeline; the model predicts the category.

Cross entropy is used to find the best parameters

For unknown distribution, you can estimate it by averaging the function applied to a set of samples. This is known as Monte Carlo sampling

Optimization is used to minimize the loss

Three subsets of the partitioned data set are:

Training data

Validation data

Test data

⁠

Training the Model in PyTorch

Training data is split into training and validation, and then data loaders are set up for training, validation, and testing

Batch size specifies the sample count for gradient approximation

Data shuffling promotes better optimization

When defining model, init_weights helps with optimization

To train your loop:

Iterate over each epoch

Set model to training mode and calculate the total loss

Divide data set into batches

Perform gradient descent

Update loss after each batch is processed.

⁠

Classifying Documents (LAB)

Installing required libraries

# All Libraries required for this lab are listed below. The libraries pre-installed on Skills Network Labs are commented.

!pip install -qy pandas==1.3.4 numpy==1.21.4 seaborn==0.9.0 matplotlib==3.5.0 scikit-learn==0.20.1

# - Update a specific package

!pip install pmdarima -U

# - Update a package to specific version

!pip install --upgrade pmdarima==2.0.2

# Note: If your environment doesn't support "!pip install", use "!mamba install"

Importing the required libraries

from tqdm import tqdm

import numpy as np

import pandas as pd

from itertools import accumulate

import matplotlib.pyplot as plt

from torchtext.data.utils import get_tokenizer

import torch

import torch.nn as nn

from torch.utils.data import DataLoader

import numpy as np

from torchtext.datasets import AG_NEWS

from IPython.display import Markdown as md

from tqdm import tqdm

from torchtext.vocab import build_vocab_from_iterator

from torchtext.datasets import AG_NEWS

from torch.utils.data.dataset import random_split

from torchtext.data.functional import to_map_style_dataset

from sklearn.manifold import TSNE

import plotly.graph_objs as go

from sklearn.model_selection import train_test_split

from torchtext.data.utils import get_tokenizer

# You can also use this section to suppress warnings generated by your code:

def warn(*args, **kwargs):

pass

import warnings

warnings.warn = warn

warnings.filterwarnings('ignore')

Defining helper functions

def plot(COST,ACC):

fig, ax1 = plt.subplots()

color = 'tab:red'

ax1.plot(COST, color=color)

ax1.set_xlabel('epoch', color=color)

ax1.set_ylabel('total loss', color=color)

ax1.tick_params(axis='y', color=color)

ax2 = ax1.twinx()

color = 'tab:blue'

ax2.set_ylabel('accuracy', color=color) # you already handled the x-label with ax1

ax2.plot(ACC, color=color)

ax2.tick_params(axis='y', color=color)

fig.tight_layout() # otherwise the right y-label is slightly clipped

plt.show()

Summary Notes

One hot encoding converts categorical data (data representing groups or categories) into vectors

The Bag of words representation portrays a document as the aggregate or average of one-hot encoded vectors.

When you feed a bag of words vector to a neural network’s hidden layer, the output is the sum of the embeddings.

The Embedding and EmbeddingBag classes are used to implement embedding and embedding bags in PyTorch.

A document classifier seamlessly categorizes articles by analyzing the text content.

A neural network is a mathematical function consisting of a sequence of matrix multiplications with a variety of other functions.

The Argmax function identifies the index of the highest logit value, corresponding to the most likely class.

Hyperparameters are externally set configurations of a neural network.

The prediction function works on real text that starts by taking in the tokenized text. It processes the text through the pipeline, and the model predicts the category.

A neural network functions via matrix and vector operations called learnable parameters.

In neural network training, learnable parameters are fine-tuned to enhance model performance. This process is steered by the loss function, which serves as a measure of accuracy.

Cross-entropy is used to find the best parameters.

For unknown distribution, estimate it by averaging the function applied to a set of samples. This technique is known as Monte Carlo sampling.

Optimization is used to minimize the loss.

Generally, the data set should be partitioned into three subsets: training data for learning, validation data for hyperparameter tuning, and test data to evaluate real world performance.

The training data is split into training and validation, and then data loaders are set up for training, validation and testing.

Batch size specifies the sample count for gradient approximation, and shuffling the data promotes better optimization.

When you define your model, init_weights helps with optimization

To train your loop:

Iterate over each epoch

Set the model to training mode

Calculate the total loss

Divide the data set into batches

Perform gradient descent

Update the loss after each batch is processed.

⁠