Significance of Generative AI
Generative AI refers to deep-learning models that can generate various types of content such as text, images, audio, 3D objects and music Contextually aware models, example GPT From Seed Image or random input Example - GAN (Generative Adversarial Network), Diffusion Model Generate Natural Sounding speech Applications of Generative AI Chatbots and Virtual Assistants Generative AI Architectures and Models
Generative AI architectures and models include RNNs, transformers, GANs, and VAEs and diffusion models.
RNN - Recurrent Neural Networks Use sequential or time series data and a loop based design for training
They utilize the self attention mechanism to focus on the most important parts of the information
GAN - Generative Adversarial Networks Consists of a generator and discriminator, which work in a competitive mode
VAEs - Variational Auto Encoder Operate on an encoder-decoder framework and create samples based on similar characteristics
Generate creative images by learning to remove noise and reconstruct distorted examples, relying on statistical properties
Generative AI for NLP (Natural Language Processing)
Rule-Based System - Follows predefined linguistic rules Machine Learning based approach - Employes statistical methods Deep Learning architecture - Uses artificial neural networks trained on extensive data sets Transformers - Designed specifically to handle sequential data, has greater ability to understand context Large Language Models - LLMs Uses AI and deep learning with vast data sets Involves training data sets of huge sizes, even reaching petabytes (1PB =1 Million GB) Contains billions of parameters, which are finetuned during training GPT - Generative Pretrained transformers BERT - Bidirectional Encoder Representation from Transformers BART - Bidirectional and Auto-Regressive Transformer T5 - Text To Text Transfer Transformer Generating outputs presented as accurate but seen unrealistic, inaccurate, or nonsensical by humans can result in generation of inaccurate information, creation of biased views and wrong input provided to sensitive applications. Prevent/Avoid hallucinations through Extensive training with high-quality data Ongoing evaluation and improvement of the models Fine-tuning on domain specific data Ensuring human oversight and Providing additional context in the prompt Libraries and Tools in NLP PyTorch - an open source deep learning framework, python based and well known for its ease of use, flexibility, and dynamic computation graphs. TensorFlow - open source framework for machine learning and deep learning, provides tools and libraries to facilitate the development and deployment of machine learning models Keras - A tight integration of TensorFlow with Keras provides a user-friendly high-level neural networks API, facilitating rapid prototyping and building and training deep learning models. Hugging Face - platform that offers an open source library with pre-trained models and tools to streamline the process of training and fine-tuning generative AI models. It offers libraries such as Transformers, Datasets, and Tokenizers. LangChain - an opensource framework that helps streamline AI application developments using LLMs. It provides tools for designing effective prompts. Pydantic - Python library that helps you streamline data handling. It ensures the accuracy of data types and formats before an application processes them. Text Generation before Transformers
They predict what words come next in a sentence based on the words that came before.
Recurrent Neural Networks (RNN) They are specially designed to handle sequential data, making them powerful for applications like language modeling and time series forecasting.
The essence of their design lies in maintaining a ‘memory’ or ‘hidden state’ throughout the sequence by employing loops.
This enables RNN to recognize and capture the temporal (time related) dependencies inherent in the sequential data.
often referred to as the network’s ‘memory’, the hidden state is a dynamic storage of information about previous sequence inputs. With each new input, this hidden state is updated, factoring in both the new input and its previous value. Loops in RNNs enable information transfer across sequence steps. Illustration of RNNs operation “I love RNNs” - RNN goes on to interpret this sentence word by word,
First it ingests the word “I”, generates an output and updates its hidden state Then it moves to “Love”, the RNN processes it alongside and updates the hidden state which ideally holds insights about the word “I”, the hidden state is updated again. This pattern of processing and updating continues till the last word is reached. Long short-term memory (LSTM) and Gated Recurrent Units (GRUs) Designed to address limitations of traditional RNNs and enhance their ability to model the sequential data effectively. They were effective for variety of tasks, but they struggled with long sequences and long-term dependencies. Seq2seq models with attention Sequence-to-sequence models - built with RNNs or LSTMs, designed to handle tasks like translation where an input sentence is transformed into an output sentence. Was introduced to allow the model to “Focus” on relevant parts of the input sequence when generating the output, significantly improving performance on tasks like machine translation. While these methods provided significant advancements in text generation tasks, the introduction of transformers led to a paradigm shift. Transformers, with their self-attention mechanism, proved to be highly efficient at capturing contextual information across long sequences, setting new benchmark across various NLP tasks
Transformers
replaced the sequential processing with parallel processing. the key component behind its success was attention mechanism, more precisely self-attention. Tokenization - breaking down the sentence into tokens Embedding - Each token represented as a vector, capturing its meaning Self-Attention - The model computes scores determining the importance of every other word for a particular word in the sequence. These scores are used to weight the input tokens and produce a new representation of the sequence. Feed-forward neural networks - After attention, each position is passed through a feed-forward network separately.