Share
Explore

Lab: Creating a Simple Word Embedding using Jupyter Notebook (AML 3304 assignment)


Objective: This lab will guide you through the process of creating a basic word embedding using TensorFlow and Keras within Jupyter Notebook.

1. Setup

Before you begin, ensure you have the required packages installed.
pythonCopy code
!pip install tensorflow jupyter numpy

2. Starting Jupyter Notebook

Open your terminal or command prompt and navigate to the directory where you'd like your notebook to reside. Type:
bashCopy code
jupyter notebook

Once the Jupyter dashboard appears in your browser, create a new Python notebook.

3. Import Necessary Libraries

In the first cell of your Jupyter notebook, import the necessary libraries:
pythonCopy code
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, Flatten, Dense

4. Prepare the Data

For this simple lab, let's use a small dataset of sentences:
pythonCopy code
sentences = [
'I love machine learning',
'I love coding in Python',
'Deep learning is fun',
'Python is great for machine learning',
'I prefer Python over Java'
]

# Tokenize the sentences
tokenizer = Tokenizer(oov_token='<OOV>')
tokenizer.fit_on_texts(sentences)
word_index = tokenizer.word_index
print(word_index)

5. Sequence the Sentences

pythonCopy code
sequences = tokenizer.texts_to_sequences(sentences)
padded_sequences = pad_sequences(sequences, padding='post')
print(padded_sequences)

6. Create a Simple Embedding Model

Now, let's design a basic model with an embedding layer:
pythonCopy code
embedding_dim = 16

model = Sequential([
Embedding(input_dim=len(word_index) + 1, output_dim=embedding_dim, input_length=padded_sequences.shape[1]),
Flatten(),
Dense(6, activation='relu'),
Dense(1, activation='sigmoid') # This dense layer is just for demonstration purposes.
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.summary()

Note: In a real-world scenario, you'd have a labeled dataset and could train the model. Here, we're focusing on the embedding layer, so we won't train the model.

7. Retrieve the Embeddings

Once the model is trained on real data, you can extract the embeddings for each word in your vocabulary.
pythonCopy code
embeddings = model.layers[0]
weights = embeddings.get_weights()[0]
print(weights)

8. Visualize the Embeddings

Embeddings in high-dimensional space can be visualized using techniques like t-SNE or PCA. However, for this simple lab, we'll just inspect them manually.
pythonCopy code
for word, i in word_index.items():
embedding = weights[i]
print(word, embedding)

9. Conclusion

You've now successfully created a simple embedding using Keras in Jupyter Notebook! This is a foundational step in Natural Language Processing tasks. With a larger dataset and a more complex model, you can capture richer semantic meanings in the embeddings.
Remember, this lab focused on the mechanics of setting up and inspecting an embedding layer. In practice, you'd often use pre-trained embeddings or train your embedding layer on a large dataset to capture meaningful word relationships.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.