Explore

Lab Workbook: Introduction to JSON and Big Data for AI Conversational Memory

⁠

Preamble:

Welcome to this foundational lab on JSON and Big Data in the context of AI conversational memory!

In this lab, we'll explore how JSON (JavaScript Object Notation) is used to structure and store conversational data for AI language models.

We'll also touch on Big Data concepts and how they relate to training and operating large language models.

Understanding how to work with JSON and manage large datasets is crucial for developing AI applications, especially chatbots and conversational AI.

This lab will prepare you for future work with more complex AI models and data processing tasks. Another element of course is to learn the tooling for AI data analytics with R and later Power BI.

Let's get started! Part 1: Setting Up the Environment (15 minutes)

We'll be using Python in Google Colab for this lab. Later we will introduce a more sophisticated and feature rich tool called R Studio.

Python is widely used in AI and data science, and Colab provides a free, easy-to-use environment.

Open Google Colab:

https://colab.research.google.com/⁠

⁠

Create a new notebook

Rename it to "JSON_BigData_AI_ConversationalMemory_Lab"

In the first cell, let's install and import the necessary libraries:

!pip install pandas nltk

import json

import pandas as pd

import nltk

from nltk.tokenize import word_tokenize

nltk.download('punkt')

Run this cell.

You should see output indicating successful installation and imports.

Part 2: Introduction to JSON (30 minutes)

JSON is a lightweight, text based data interchange format that's easy for humans to read and write and easy for machines to parse and generate.

It's commonly used for storing and transmitting data in web applications, including AI models.

One of the great wins for application development is: We can programmatically change the shape of the data containers under program control dynamically at runtime.

In the traditional MVC web application: your business rules and algorithms are stored: In the controller, in the format of programming structures (loops, if then).

With JSON, we can store our business processes in the database - Because we describe business processes in a declarative XML language format called BPEL and BPMN.

Business Process Execution Language.

Business Process Modeller Notation.

JSON is the datastore for Service Oriented Architectures (SOA) - and SOA is the “front end” point of interaction between Human and AI LLM:

Note: There are 2 architectures in the world right now to construct Enterprise Systems.

(1) SOA : What is a Service? Just a Method Call from an object. A service is Just a Method Call. When we do an SOA, we are now calling methods on objects which live inside Operating Systems which I can connect to via TCP/IP. In SOA, 1 BLOCK (1 container) PER algorithm. Each block talks to the others via TCP IP they each have their own Port Number to be addressed to. Each block has its very own database, usually a JSON data store. SOA is phenominally useful for building BPEL distributed applications, which is becoming a big thing now with Internet of Things and Distributed Computing Edge Applications.

(2) MVC : MVC is a very narrow subcase of SOA. MVC is still SOA because in this Monolithic cement block of the Controller: Within one runtime enviroment. In MVC all algorithms are trapped inside this monolith called the Controller.

Watch my video on how SOA applications are built:

⁠

These are xml schema languges like HTML and JSON which describe and control your Business processes, can be stored in JSON databases like MONGO DB, and changed programmatically under AI control at runtime to adapt to changing to changing market, business and enviromental conditions.

⁠

IBM Watson was Chat GPT before there was Chat GPT:

⁠

Let's create a simple JSON structure to represent a conversation:

import json

conversation = {

"conversation_id": "12345",

"participants": ["user", "ai"],

"messages": [

{

"sender": "user",

"content": "Hello, how are you?",

"timestamp": "2023-11-20T10:00:00Z"

{

"sender": "ai",

"content": "Hello! I'm functioning well, thank you. How can I assist you today?",

"timestamp": "2023-11-20T10:00:05Z"

}

]

}

print(json.dumps(conversation, indent=2))

Run this cell. You'll see a nicely formatted JSON output representing a simple conversation.

Now, let's write this conversation to a file:

with open('conversation.json', 'w') as f:

json.dump(conversation, f)

print("Conversation saved to file.")

And read it back:

with open('conversation.json', 'r') as f:

loaded_conversation = json.load(f)

print("Loaded conversation:")

print(json.dumps(loaded_conversation, indent=2))

You should see the same conversation structure printed out.

Part 3: Working with Conversational Data

Now, let's simulate a larger dataset of conversations. We'll create multiple conversations and store them in a list:

import random

import datetime

def generate_random_conversation(conv_id):

user_messages = [

"Hello, how are you?",

"What's the weather like today?",

"Can you tell me a joke?",

"What's the capital of France?",

"How do I bake a cake?"

]

ai_responses = [

"Hello! I'm doing well. How can I assist you?",

"I'm sorry, I don't have real-time weather information. You might want to check a weather website or app for the most current data.",

"Sure! Why don't scientists trust atoms? Because they make up everything!",

"The capital of France is Paris.",

"To bake a cake, you'll need ingredients like flour, sugar, eggs, and butter. Start by preheating your oven, then mix your dry ingredients..."

]

messages = []

for _ in range(random.randint(2, 5)):

user_msg = random.choice(user_messages)

ai_msg = ai_responses[user_messages.index(user_msg)]

messages.append({

"sender": "user",

"content": user_msg,

"timestamp": datetime.datetime.now().isoformat()

})

messages.append({

"sender": "ai",

"content": ai_msg,

"timestamp": (datetime.datetime.now() + datetime.timedelta(seconds=5)).isoformat()

})

return {

"conversation_id": str(conv_id),

"participants": ["user", "ai"],

"messages": messages

}

conversations = [generate_random_conversation(i) for i in range(1000)]

print(f"Generated {len(conversations)} conversations.")

print("\nSample conversation:")

print(json.dumps(conversations[0], indent=2))

This script generates 1000 simulated conversations. Run it and examine the output to see a sample conversation.

Part 4: Analyzing Conversational Data

Now that we have a dataset, let's perform some basic analysis:

import pandas as pd

import nltk

from nltk.tokenize import word_tokenize

nltk.download('punkt')

# Convert to DataFrame for easier analysis

df = pd.json_normalize(conversations, record_path='messages', meta=['conversation_id'])

print("Total number of messages:", len(df))

print("\nMessages per sender:")

print(df['sender'].value_counts())

print("\nUnique conversations:", df['conversation_id'].nunique())

print("\nAverage message length:")

df['message_length'] = df['content'].apply(lambda x: len(word_tokenize(x)))

print(df.groupby('sender')['message_length'].mean())

This code converts our JSON data to a pandas DataFrame and performs some basic analysis. Run it and examine the output.

Part 5: Storing and Retrieving Conversational Memory

In a real AI system, we'd typically store this data in a database. For this lab, we'll simulate database operations using JSON files.

Let's create a simple system to store and retrieve conversational memory:

class ConversationMemory:

def __init__(self, filename='conversation_memory.json'):

self.filename = filename

try:

with open(self.filename, 'r') as f:

self.memory = json.load(f)

except FileNotFoundError:

self.memory = {}

def save_conversation(self, conversation):

self.memory[conversation['conversation_id']] = conversation

with open(self.filename, 'w') as f:

json.dump(self.memory, f)

def get_conversation(self, conversation_id):

return self.memory.get(conversation_id, None)

def get_last_message(self, conversation_id):

conversation = self.get_conversation(conversation_id)

if conversation and conversation['messages']:

return conversation['messages'][-1]['content']

return None

# Initialize ConversationMemory

memory = ConversationMemory()

# Save a few conversations

for conv in conversations[:5]:

memory.save_conversation(conv)

print("Saved 5 conversations to memory.")

# Retrieve a conversation

retrieved_conv = memory.get_conversation('2')

print("\nRetrieved conversation 2:")

print(json.dumps(retrieved_conv, indent=2))

# Get last message from a conversation

last_message = memory.get_last_message('3')

print("\nLast message from conversation 3:")

print(last_message)

This code demonstrates a simple system for storing and retrieving conversational memory.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.