Objective: This lab will guide you through the process of creating a basic word embedding using TensorFlow and Keras within Jupyter Notebook.
Before you begin, ensure you have the required packages installed.
!pip install tensorflow jupyter numpy
2. Starting Jupyter Notebook
Open your terminal or command prompt and navigate to the directory where you'd like your notebook to reside. Type:
Once the Jupyter dashboard appears in your browser, create a new Python notebook.
3. Import Necessary Libraries
In the first cell of your Jupyter notebook, import the necessary libraries:
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Flatten, Dense
4. Prepare the Data
For this simple lab, let's use a small dataset of sentences:
sentences = [
'I love machine learning',
'I love coding in Python',
'Deep learning is fun',
'Python is great for machine learning',
'I prefer Python over Java'
# Tokenize the sentences
tokenizer = Tokenizer(oov_token='<OOV>')
word_index = tokenizer.word_index
5. Sequence the Sentences
sequences = tokenizer.texts_to_sequences(sentences)
padded_sequences = pad_sequences(sequences, padding='post')
6. Create a Simple Embedding Model
Now, let's design a basic model with an embedding layer:
embedding_dim = 16
model = Sequential([
Embedding(input_dim=len(word_index) + 1, output_dim=embedding_dim, input_length=padded_sequences.shape),
Dense(1, activation='sigmoid') # This dense layer is just for demonstration purposes.
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Note: In a real-world scenario, you'd have a labeled dataset and could train the model. Here, we're focusing on the embedding layer, so we won't train the model.
7. Retrieve the Embeddings
Once the model is trained on real data, you can extract the embeddings for each word in your vocabulary.
embeddings = model.layers
weights = embeddings.get_weights()
8. Visualize the Embeddings
Embeddings in high-dimensional space can be visualized using techniques like t-SNE or PCA. However, for this simple lab, we'll just inspect them manually.
for word, i in word_index.items():
embedding = weights[i]
You've now successfully created a simple embedding using Keras in Jupyter Notebook! This is a foundational step in Natural Language Processing tasks. With a larger dataset and a more complex model, you can capture richer semantic meanings in the embeddings.
Remember, this lab focused on the mechanics of setting up and inspecting an embedding layer. In practice, you'd often use pre-trained embeddings or train your embedding layer on a large dataset to capture meaningful word relationships.