Explore

Gen AI & LLMs: Architecture & Data Preparation

Significance of Generative AI

Generative AI refers to deep-learning models that can generate various types of content such as text, images, audio, 3D objects and music

Text

Contextually aware models, example GPT

Image

From Text input

From Seed Image or random input

Example - GAN (Generative Adversarial Network), Diffusion Model

Audio

Generate Natural Sounding speech

Text to speech synthesis

Example - wavenet

Applications of Generative AI

Content Creation

Condensing Documents

Language Translation

Chatbots and Virtual Assistants

Data Analysis

⁠

Generative AI Architectures and Models

Generative AI architectures and models include RNNs, transformers, GANs, and VAEs and diffusion models.

RNN - Recurrent Neural Networks

Use sequential or time series data and a loop based design for training

Transformers

They utilize the self attention mechanism to focus on the most important parts of the information

GAN - Generative Adversarial Networks

Consists of a generator and discriminator, which work in a competitive mode

VAEs - Variational Auto Encoder

Operate on an encoder-decoder framework and create samples based on similar characteristics

Diffusion models

Generate creative images by learning to remove noise and reconstruct distorted examples, relying on statistical properties

⁠

Generative AI for NLP (Natural Language Processing)

Evolution of AI for NLP

Rule-Based System - Follows predefined linguistic rules

Machine Learning based approach - Employes statistical methods

Deep Learning architecture - Uses artificial neural networks trained on extensive data sets

Transformers - Designed specifically to handle sequential data, has greater ability to understand context

Large Language Models - LLMs

Uses AI and deep learning with vast data sets

Involves training data sets of huge sizes, even reaching petabytes (1PB =1 Million GB)

Contains billions of parameters, which are finetuned during training

Examples

GPT - Generative Pretrained transformers

BERT - Bidirectional Encoder Representation from Transformers

BART - Bidirectional and Auto-Regressive Transformer

T5 - Text To Text Transfer Transformer

Hallucinations in LLMs

Generating outputs presented as accurate but seen unrealistic, inaccurate, or nonsensical by humans

can result in generation of inaccurate information, creation of biased views and wrong input provided to sensitive applications.

Prevent/Avoid hallucinations through

Extensive training with high-quality data

Avoiding manipulation

Ongoing evaluation and improvement of the models

Fine-tuning on domain specific data

Being vigilant

Ensuring human oversight and

Providing additional context in the prompt

Libraries and Tools in NLP

PyTorch - an open source deep learning framework, python based and well known for its ease of use, flexibility, and dynamic computation graphs.

TensorFlow - open source framework for machine learning and deep learning, provides tools and libraries to facilitate the development and deployment of machine learning models

Keras - A tight integration of TensorFlow with Keras provides a user-friendly high-level neural networks API, facilitating rapid prototyping and building and training deep learning models.

Hugging Face - platform that offers an open source library with pre-trained models and tools to streamline the process of training and fine-tuning generative AI models. It offers libraries such as Transformers, Datasets, and Tokenizers.

LangChain - an opensource framework that helps streamline AI application developments using LLMs. It provides tools for designing effective prompts.

Pydantic - Python library that helps you streamline data handling. It ensures the accuracy of data types and formats before an application processes them.

⁠

Text Generation before Transformers

N-Gram Models

They predict what words come next in a sentence based on the words that came before.

Recurrent Neural Networks (RNN)

They are specially designed to handle sequential data, making them powerful for applications like language modeling and time series forecasting.

The essence of their design lies in maintaining a ‘memory’ or ‘hidden state’ throughout the sequence by employing loops.

This enables RNN to recognize and capture the temporal (time related) dependencies inherent in the sequential data.

Hidden state

often referred to as the network’s ‘memory’, the hidden state is a dynamic storage of information about previous sequence inputs. With each new input, this hidden state is updated, factoring in both the new input and its previous value.

Temporal dependency

Loops in RNNs enable information transfer across sequence steps.

Illustration of RNNs operation

“I love RNNs” - RNN goes on to interpret this sentence word by word,

First it ingests the word “I”, generates an output and updates its hidden state

Then it moves to “Love”, the RNN processes it alongside and updates the hidden state which ideally holds insights about the word “I”, the hidden state is updated again.

This pattern of processing and updating continues till the last word is reached.

⁠

Long short-term memory (LSTM) and Gated Recurrent Units (GRUs)

Variants of RNNS

Designed to address limitations of traditional RNNs and enhance their ability to model the sequential data effectively.

They were effective for variety of tasks, but they struggled with long sequences and long-term dependencies.

Seq2seq models with attention

Sequence-to-sequence models - built with RNNs or LSTMs, designed to handle tasks like translation where an input sentence is transformed into an output sentence.

Was introduced to allow the model to “Focus” on relevant parts of the input sequence when generating the output, significantly improving performance on tasks like machine translation.

While these methods provided significant advancements in text generation tasks, the introduction of transformers led to a paradigm shift. Transformers, with their self-attention mechanism, proved to be highly efficient at capturing contextual information across long sequences, setting new benchmark across various NLP tasks

⁠

Transformers

replaced the sequential processing with parallel processing.

the key component behind its success was attention mechanism, more precisely self-attention.

Key steps

Tokenization - breaking down the sentence into tokens

Embedding - Each token represented as a vector, capturing its meaning

Self-Attention - The model computes scores determining the importance of every other word for a particular word in the sequence. These scores are used to weight the input tokens and produce a new representation of the sequence.

Feed-forward neural networks - After attention, each position is passed through a feed-forward network separately.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.