Share
Explore

Hugging Face PLATFORM and a set of APIs that we can build our AI Product on top of

Key benefits:
Open source and free
Huggingface provides access to a number of high quality pre-trained models that we can build on top of.
Large user community. Very well supported.

Learning Outcomes:

Doing AI application development with Hugging Face. https://huggingface.co/ go sign up.
Be able to explain the purpose and concepts and work with use cases of developing AI Applications with using Hugging Face models and APIs.

Lecture: Application Development with Hugging Face

Introduction

Hugging Face is a rapidly-growing, data science-focused company known primarily for its state-of-the-art machine learning and artificial intelligence (AI) models.
Hugging Face is an AI community that focuses on advancing and democratizing building AI Products through open-source and open science.
They provide a platform for building, training, and deploying state-of-the-art machine learning models, with a primary focus on natural language processing (NLP)
.
The Hugging Face ecosystem includes libraries like:
- Transformers
Datasets
Tokenizers
Accelerate
which facilitate the development of AI applications
These models are widely used for a range of Natural Language Processing (NLP) tasks, including text classification, named entity recognition, and even language translation.
In this lecture, we will delve into the purpose, concepts, and practical applications of Hugging Face's transformative technologies.
image.png
image.png

The Story of Hugging Face API: Bringing AI to the Masses

It was the year 2016, and the world of Artificial Intelligence was in the throes of a revolution. Ground-breaking work in deep learning had already made its mark in fields such as computer vision.
But another field, Natural Language Processing (NLP), was ripe for transformation. Two enterprising individuals, Clément Delangue and Julien Chaumond, were about to change that.
Delangue and Chaumond, former employees of the voice-recognition start-up Wit.ai (acquired by Facebook), realized the vast potential of conversational AI. They noticed a significant gap between the high-quality conversational AI technologies developed by tech giants and what was accessible to smaller businesses and developers. This discrepancy inspired the duo to democratize access to these technologies, and Hugging Face was born.
Named after the typical French cheek-to-cheek kiss greeting, Hugging Face was initially launched as an AI-powered chatbot app.
However, as they worked on improving their chatbot, they began building more sophisticated NLP models.
Recognizing the impact of the Transformer architecture on NLP, they saw an opportunity to contribute to the community.
In 2018, Hugging Face shifted its focus to developing a general-purpose NLP library, which was well received by the community. With the rise of Transformer models such as BERT and GPT, the team at Hugging Face saw the need for an easy-to-use library that would allow developers to leverage these state-of-the-art models.
The result was the transformers library, an open-source library that quickly became the go-to resource for Transformer models in NLP.
The company also adopted a unique business model, where they open-sourced their models and tools, fostering an inclusive and collaborative AI community.
{Research paper project people: Address the business and cash flow elements: How do they make money?}
The following years saw Hugging Face grow in prominence.
As the library expanded:
it expanded the number of pre-trained models
it provided more APIs to provide functionalities for training and fine-tuning the training of your own data sets.
The library was designed to be user-friendly, providing high-level abstractions known as pipelines for common NLP tasks.
This helped provide avenues for greater integration of NLP services into your own IT products, allowing developers, researchers, and companies to leverage state-of-the-art models without needing the resources to train them from scratch.

Today, Hugging Face stands at the forefront of NLP. Their transformers library is a standard tool in the NLP developer toolkit, boasting an extensive collection of more than 10,000 pre-trained models in over 100 languages.
Hugging Face's journey is a testament to the power of open-source, collaboration, and the vision to enable all developers to integrate AI into their IT platforms and business processes.
What started as an ambition to develop a friendly AI chatbot turned into a mission to give every developer access to the most advanced NLP tools.
As the world continues to realize the power of language, Hugging Face is sure to play a pivotal role in shaping that future.

1. Introduction to Hugging Face API

Sign up for an Account:
1.1 Explanation of Hugging Face and its role in NLP

Introduction to Hugging Face

Hugging Face is a company that specializes in the development of machine learning models.
They provide an array of powerful tools and models for Natural Language Processing (NLP) and Natural Language Understanding (NLU).
Named after a quirky emoji, Hugging Face aims to bring a friendly face to complex artificial intelligence technologies. It enables all of us to engineer AI applications and apply them to the business process needs of our organization.
The organization has made waves in the AI industry with its transformative tools, resources, and open-source contributions.
The company’s most popular offering is the Hugging Face Transformers library, which contains a massive collection of pre-trained models and resources designed to make NLP tasks more accessible to researchers, data scientists, and developers:
Hugging Face Transformers Library is a platform (of pre-trained language models) and APIs to provide us with easy to access Generative AI language model capabilities.

Hugging Face’s Role in NLP

Democratization of AI

The accessibility of powerful machine learning models and techniques, such as Transformers, has revolutionized the NLP landscape.
However, building, training, and fine-tuning such models can be resource-intensive and often requires significant programming and mathematical expertise.
Hugging Face's main goal is to provide easy of access to this technology by providing pre-trained models that are easy to use and understand, and apply to your Specific Business Domain, regardless of your level of technical expertise.

State-of-the-Art Models

Hugging Face offers access to state-of-the-art models such as BERT, GPT-2, RoBERTa, and more.
These models have been pre-trained on massive text corpora and fine-tuned for various NLP tasks, allowing you to leverage their capabilities without requiring significant computational resources or time.

Extensive NLP Task Coverage

The Hugging Face Transformers library provides support for a wide array of NLP tasks.
Whether you're interested in text classification, named entity recognition, sentiment analysis, or language translation, you'll find a pre-trained model in the library that can help.

Open-Source Collaboration

Hugging Face's commitment to open-source principles fosters a collaborative environment. Developers and researchers around the world contribute to the Transformers library, making it a continually evolving and improving resource.

User-Friendly API and Pipelines

Hugging Face also provides a user-friendly API and pipelines. These tools abstract much of the complexity associated with using machine learning models, allowing you to focus on the task at hand.
In conclusion, Hugging Face plays a critical role in NLP by providing easy API access to state-of-the-art models, supporting a wide range of NLP tasks, promoting open-source collaboration, and providing user-friendly tools and resources.
Through these efforts, Hugging Face is making NLP more accessible and easier to apply, and integrate into our every day on-the-job business processes, accelerating the development of innovative applications and solutions in the field.

A minimal viable product (MVP) that uses the Hugging Face API could be a sentiment analysis application. {Consider how you could use this for your Project.}

Here's a simple Python script that will take a sentence as an input and determine if the sentiment is positive or negative using the Hugging Face's pipeline for sentiment-analysis.
Before running the script, make sure you have installed the necessary libraries. If not, you can install them using pip:

pip install transformers torch

Here's the simple MVP Python program:

from transformers import pipeline

def sentiment_analysis(sentence):
# Initialize the Hugging Face sentiment analysis pipeline
classifier = pipeline('sentiment-analysis')

# Use the pipeline to analyze the sentence
result = classifier(sentence)[0]

return result

# Test the function with a sentence
print(sentiment_analysis("I love learning about AI with Hugging Face!"))

This program uses the pipeline function from Hugging Face's transformers library to analyze the sentiment of the provided sentence.
The pipeline function returns a list of dictionary(s), where each dictionary represents information about the sentiment of a sentence.
In this case, we only analyze one sentence, so we just take the first element of the list (which is the only element).
The dictionary has two keys - 'label' and 'score'. The 'label' represents the sentiment (either 'POSITIVE' or 'NEGATIVE'), and the 'score' represents the confidence of the prediction (a number between 0 and 1).
Please note that for more complex applications, you might want to consider fine-tuning the model on a specific task, managing your own model hosting, or utilizing other advanced features of the Hugging Face library.
1.2 Understanding the Hugging Face API

Introduction to Hugging Face API

An API refers to a set of functions and procedures that allows applications to access the features or data of an operating system, application, or other service.
With the Hugging Face API, you can directly interact with Transformer models and perform NLP tasks.
It also provides apis to set the features of the models and tokenizers.
Let's take a deeper dive into the Hugging Face API and how you can use it.

Core Components of Hugging Face API

The Hugging Face API consists of several components, the most important of which are:
Models: Pre-trained transformer models that can be fine-tuned on specific tasks: What this means for you in terms of your Project: Take an applicable model, and layer in some more of your own training data.
Tokenizers: They convert input text into tokens, which are numerical representations stored in PYTHON TENSORS that models work with.
Pipelines: These are high-level objects that automatically perform the tasks of tokenization, model input, output and post-processing.

Using Hugging Face API

Installation

To start, you need to install the Hugging Face transformers library. You can do this via pip:

pip install transformers

Importing the Pipeline

To use a pre-trained model, you can import the pipeline function:
pythonCopy code
from transformers import pipeline

Using a Pre-trained Model

After importing the pipeline, you can select a task to complete. For example, if you want to classify the sentiment of a text, you can use the 'sentiment-analysis' pipeline:
pythonCopy code
# Initialize a pipeline for sentiment analysis
classifier = pipeline('sentiment-analysis')

# Analyze the sentiment of a text
result = classifier('I love learning about AI with Hugging Face!')[0]
print(f"label: {result['label']}, with score: {result['score']:.4f}")

Full Python program to illustrate the steps:

# First, make sure to install the Hugging Face transformers library.
# You can uncomment the following line to do this directly in your Python script:
# !pip install transformers

# Import the necessary library
from transformers import pipeline

# Initialize a pipeline for sentiment analysis
classifier = pipeline('sentiment-analysis')

# Analyze the sentiment of a text
result = classifier('I love learning about AI with Hugging Face!')[0]
print(f"label: {result['label']}, with score: {result['score']:.4f}")

This script will print the sentiment analysis result of the text "I love learning about AI with Hugging Face!". It outputs the sentiment label (either 'POSITIVE' or 'NEGATIVE') and the associated confidence score.
Remember, if you run this script locally, you should use the command line to install the transformers library (by running pip install transformers) before running the script. If you're running this script in a Jupyter notebook, you can uncomment the line !pip install transformers to install the library directly in your notebook.

Named Entity Recognition

Another common task is named entity recognition.
Here's how to complete that task with the Hugging Face API:

correct code:

from transformers import pipeline
# Initialize a pipeline for named entity recognition ner = pipeline('ner', model='dbmdz/bert-large-cased-finetuned-conll03-english')
# Recognize named entities in a text result = ner('My name is John and I live in San Francisco.')
for entity in result: print(f"Entity: {entity['entity']}, Word: {entity['word']}, Index: {entity['index']}, Score: {entity['score']:.4f}")

Text Generation

The Hugging Face API also includes pre-trained models for text generation:

# Initialize a pipeline for text generation
text_generator = pipeline('text-generation')

# Generate a text
result = text_generator('Once upon a time, there was a little girl named')[0]
print(result['generated_text'])

To use different models for text generation, you just need to specify the model name in the pipeline function.

Here are examples of text generation with three different models:
gpt2
EleutherAI/gpt-neo-1.3B
text-davinci-002.

Before running these examples, make sure you have the transformers library installed in your Python environment.

from transformers import pipeline

# Initialize a pipeline for text generation with GPT-2
text_generator_gpt2 = pipeline('text-generation', model='gpt2')

# Generate a text with GPT-2
result_gpt2 = text_generator_gpt2('Once upon a time, there was a little girl named')[0]
print("GPT-2: ", result_gpt2['generated_text'])

# Initialize a pipeline for text generation with GPT-Neo
text_generator_gpt_neo = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')

# Generate a text with GPT-Neo
result_gpt_neo = text_generator_gpt_neo('Once upon a time, there was a little girl named')[0]
print("GPT-Neo: ", result_gpt_neo['generated_text'])

# Initialize a pipeline for text generation with text-davinci-002
text_generator_davinci = pipeline('text-generation', model='openai-gpt')

# Generate a text with text-davinci-002
result_davinci = text_generator_davinci('Once upon a time, there was a little girl named')[0]
print("Text-DaVinci-002: ", result_davinci['generated_text'])

Please note, EleutherAI/gpt-neo-1.3B model and text-davinci-002 are very large models, they might take a long time to download and generate text, also they require a lot of resources (RAM/CPU/GPU) to run.
Also, the behavior of the text generation can be adjusted using additional parameters like max_length, temperature etc. based on the requirements.

Updated version of the code that allows you to set parameters for max_length (the maximum length of the generated text), temperature (the higher the value, the more random the output), and num_return_sequences (the number of different sequences to generate):

from transformers import pipeline

# Define hyperparameters
max_length = 50
temperature = 0.7
num_return_sequences = 3

# Initialize a pipeline for text generation with GPT-2
text_generator_gpt2 = pipeline('text-generation', model='gpt2')

# Generate a text with GPT-2
result_gpt2 = text_generator_gpt2('Once upon a time, there was a little girl named', max_length=max_length, temperature=temperature, num_return_sequences=num_return_sequences)
for i, res in enumerate(result_gpt2):
print(f"GPT-2 text {i+1}: ", res['generated_text'])

# Initialize a pipeline for text generation with GPT-Neo
text_generator_gpt_neo = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')

# Generate a text with GPT-Neo
result_gpt_neo = text_generator_gpt_neo('Once upon a time, there was a little girl named', max_length=max_length, temperature=temperature, num_return_sequences=num_return_sequences)
for i, res in enumerate(result_gpt_neo):
print(f"GPT-Neo text {i+1}: ", res['generated_text'])

# Initialize a pipeline for text generation with text-davinci-002
text_generator_davinci = pipeline('text-generation', model='openai-gpt')

# Generate a text with text-davinci-002
result_davinci = text_generator_davinci('Once upon a time, there was a little girl named', max_length=max_length, temperature=temperature, num_return_sequences=num_return_sequences)
for i, res in enumerate(result_davinci):
print(f"Text-DaVinci-002 text {i+1}: ", res['generated_text'])

This code will generate num_return_sequences number of sequences with a maximum length of max_length for each of the models. It controls the randomness of the output with the temperature parameter.


In conclusion, the Hugging Face API is a powerful and versatile tool that provides access to state-of-the-art pre-trained models for a wide range of NLP tasks.
The high-level pipeline abstraction makes it easy to use these models without getting lost in the details of tokenization and model architecture.
1.3 Benefits of using the Hugging Face API

Hugging Face's mission to democratize artificial intelligence has led to the development of a robust and user-friendly API that provides numerous benefits to the AI community. Here, we will explore the significant benefits that the Hugging Face API brings to the table:

1. Access to State-of-the-Art Pretrained Models

Hugging Face's transformers library hosts a wide variety of state-of-the-art models that are pretrained on diverse datasets and are ready for fine-tuning. The models include BERT, GPT-2, RoBERTa, T5, and more. They cover a broad range of languages and are optimized for a variety of tasks. This gives developers and researchers the ability to quickly experiment with different models without needing the extensive resources to train them from scratch.

2. Simplified Workflow

Hugging Face's API abstracts away the complexities of working with Transformer models. With its high-level interface, it becomes straightforward to load models, make predictions, and even fine-tune models on your own datasets. This simplicity and speed can significantly improve productivity, especially for beginners or those working on proof-of-concept projects.

3. Versatile NLP Task Coverage

From sentiment analysis and text generation to question answering and language translation, the Hugging Face API provides extensive coverage of NLP tasks. This means that regardless of the NLP problem you're facing, there's a good chance that Hugging Face has a solution for you.

4. Extensive Community Support and Resources

As an open-source platform, Hugging Face enjoys significant contributions and support from the AI community worldwide. This results in a continually evolving platform that incorporates the latest developments in the field. It also means that if you encounter a problem or need advice, you're likely to find a solution within the community.

5. Scalability and Efficiency

The Hugging Face API is designed to work seamlessly on different scales of data, from small to large datasets. It supports parallel and distributed processing, allowing you to leverage GPU acceleration for efficient computation.

6. Integration with Popular Machine Learning Frameworks

Hugging Face's API integrates seamlessly with popular deep learning frameworks like PyTorch and TensorFlow. This allows developers to build upon their existing knowledge and take advantage of the tools and functionalities offered by these frameworks.

7. Continual Learning Opportunities

The open-source nature of Hugging Face and the high-level API allows for exploration and continual learning. Users can look at the source code, experiment with different models and tasks, and contribute to the growing body of knowledge and resources.
In summary, the benefits of the Hugging Face API extend beyond providing access to top-tier Transformer models. It simplifies NLP workflows, encourages community participation, integrates with established machine learning frameworks, and promotes a culture of open source and continual learning. Whether you're a student, researcher, or professional, the Hugging Face API has much to offer in the field of NLP.

2. Getting Started with Hugging Face API

2.1 Setting up the Development Environment
2.2 Installation of necessary libraries (transformers, torch)
2.3 Accessing the Hugging Face API

Next, you need to install the necessary libraries, namely transformers and torch. You can do this using pip:

bashCopy code
pip install transformers torch

2.3 Accessing the Hugging Face API

Once the necessary libraries are installed, you can start using the Hugging Face API. Here's a Python program that reads a text file and uses the Hugging Face API to perform sentiment analysis:
pythonCopy code
from transformers import pipeline

# Define a function to perform sentiment analysis
def analyze_sentiment(text):
# Initialize the Hugging Face sentiment analysis pipeline
classifier = pipeline('sentiment-analysis')
result = classifier(text)
return result

# Read text file
with open('your_file.txt', 'r') as file:
data = file.read().replace('\n', '')

# Analyze sentiment of the text file
sentiment_result = analyze_sentiment(data)
print(sentiment_result)

Just replace 'your_file.txt' with the path to your text file. This script will output a list of dictionaries. Each dictionary corresponds to a sentence from the text file and contains two keys:
label: This can be either 'POSITIVE' or 'NEGATIVE', indicating the sentiment of the sentence.
score: This is a float between 0 and 1, representing the confidence of the prediction.
Please note that more complex applications may require further installation and setup steps, such as setting up a GPU for model training and inference, configuring cloud storage for model checkpoints, or setting up a web server for model deployment.

3. Understanding Key Concepts in Hugging Face API

3.1 Overview of Pre-trained Models
3.2 Understanding Tokenizers
3.3 Understanding Pipelines
3.4 Understanding Model Fine-tuning

4. Practical Applications of Hugging Face API

4.1 Text Classification
Concept of Text Classification
Implementation using Hugging Face API
Example Use Case
4.2 Named Entity Recognition
Concept of Named Entity Recognition
Implementation using Hugging Face API
Example Use Case
4.3 Language Translation
Concept of Language Translation
Implementation using Hugging Face API
Example Use Case
4.4 Text Generation
Concept of Text Generation
Implementation using Hugging Face API
Example Use Case

5. Building a Real-World AI Product with Hugging Face API

5.1 Ideation: Defining the Problem Statement for the AI Product
5.2 Data Collection and Preparation
5.3 Choosing the Appropriate Pre-Trained Model
5.4 Fine-Tuning the Model as per Requirements
5.5 Implementing the Model using Hugging Face API
5.6 Evaluating and Improving the Model
5.7 Deploying the Model
5.8 Scaling and Maintenance of the AI Product

6. Conclusion

6.1 Recap of Hugging Face API and Its Application
6.2 Potential Challenges and Solutions
6.3 Future Trends and Advancements in Hugging Face and NLP
6.4 Encouragement for Exploration and Innovation using Hugging Face API
Each section will consist of theoretical explanations, code snippets, practical examples, and recommended best practices to facilitate a deep understanding of the Hugging Face API and its applications in building AI products.

The Purpose of Hugging Face

Hugging Face aims to democratize AI by making cutting-edge machine learning models accessible and understandable to both data scientists and developers.
This is achieved through the Hugging Face Transformers library, which provides thousands of pre-trained models for a wide range of NLP tasks.
This means that instead of spending countless hours training your own models from scratch, you can leverage the power of pre-trained models that have already been optimized for performance.

Hugging Face Transformers: Key Concepts

Hugging Face Transformers is a Python-based library that provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, etc.) for Natural Language Understanding (NLU) and Natural Language Generation (NLG). Some key concepts associated with Hugging Face Transformers include:

1. Pre-Trained Models:

These are models that have already been trained on large amounts of data. They can be fine-tuned for specific tasks with a relatively small amount of data, saving a lot of time and resources.

2. Tokenizers:

Tokenizers are used to convert input text into a format that can be understood by the model. This includes splitting the input into tokens (small units of text), and mapping these tokens to their respective IDs in the model vocabulary.

3. Pipelines:

Pipelines are a high-level abstraction that make it easy to use a model for a specific task, such as text classification or named entity recognition. They take care of all the necessary steps, from tokenization to outputting the final result.

Use Cases of Hugging Face

Let's look at three practical applications of Hugging Face technologies:

1. Text Classification:

This is one of the most common NLP tasks. You can use Hugging Face Transformers to classify text into predefined categories. For example, you could classify movie reviews as positive or negative.

2. Named Entity Recognition (NER):

NER involves identifying and categorizing named entities in a text into predefined categories such as persons, organizations, locations, etc. Hugging Face provides pre-trained models that are excellent at this task.

3. Language Translation:

With Hugging Face Transformers, you can translate text from one language to another. This opens up a wide range of possibilities, from creating multilingual chatbots to translating documents.

Python Coding Lab Notebook: Developing AI Applications with Hugging Face

This notebook will guide you through three practical tasks using the Hugging Face Transformers library: Text Classification, Named Entity Recognition (NER), and Language Translation.
Note: Before proceeding, ensure you have installed the transformers and torch libraries by running pip install transformers torch.

1. Text Classification

We will use the pipeline function to create a sentiment-analysis pipeline. This pipeline will use the distilbert-base-uncased-finetuned-sst-2-english pre-trained model.
from transformers import pipeline

# Create a pipeline
nlp = pipeline("sentiment-analysis")

# Classify the text
result = nlp("I really enjoy studying AI and ML.")[0]

print(f"label: {result['label']}, with score: {result['score']:.4f}")

2. Named Entity Recognition (NER)

We'll use the ner pipeline and the dbmdz/bert-large-cased-finetuned-conll03-english pre-trained model.
pythonCopy code
from transformers import pipeline

# Create a pipeline
nlp = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

# Recognize named entities in the text
result = nlp("The Hugging Face team is based in New York City.")

# Print each named entity
for entity in result:
print(f"{entity['entity']}: {entity['word']}")

3. Language Translation

We'll use the translation_en_to_de pipeline and the t5-base pre-trained model.
pythonCopy code
from transformers import pipeline

# Create a pipeline
translator = pipeline("translation_en_to_de", model="t5-base")

# Translate the text
result = translator("The quick brown fox jumps over the lazy dog.")[0]

print(f"Translated text: {result['translation_text']}")

These are just a few examples of what you can do with Hugging Face Transformers. By providing easy access to pre-trained models, Hugging Face empowers you to build advanced AI applications with ease.

Python Coding Lab Notebook

To create a Python coding lab notebook, you can use Jupyter Notebook, which allows you to combine code, text, and visualizations in a single document.
Here's a brief outline of a Jupyter Notebook for a lesson on Hugging Face application development:
Introduction: Provide an overview of Hugging Face and its ecosystem, including the Transformers, Datasets, Tokenizers, and Accelerate libraries.
Installation: Guide students through installing the necessary libraries, such as Transformers and Datasets, using !pip install transformers datasets.
Loading Pre-trained Models: Teach students how to load pre-trained models using the AutoModel and AutoTokenizer classes
.
Fine-tuning Models: Explain the process of fine-tuning pre-trained models on specific tasks, such as text classification, NER, or multilingual translation
.
Inference: Show students how to use the pipeline() function for inference, which is the easiest and fastest way to use a pre-trained model for various tasks
.
Building AI Applications: Provide examples of building AI applications using Hugging Face techniques, such as the text classification, NER, and multilingual translation use cases mentioned earlier.
Conclusion: Summarize the key concepts covered in the lesson and encourage students to explore further applications of Hugging Face techniques.
For more detailed tutorials and examples, you can refer to the Hugging Face documentation
and the Hugging Face NLP Course
.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.