Share
Explore

R programming lab in Google Colab that demonstrates AI model building and AI-assisted data analysis.

info

This connects to the Course Outline Topics “Tools of Data Science”


Learning outcomes:

- R programming lab in Google Colab that demonstrates AI model building and AI-assisted data analysis.
This lab will focus on text classification using machine learning techniques, which aligns well with the course objectives related to NLP and ML applications.
megaphone

Preamble: Setting up R in Google Colab

Before we begin our R programming lab, we need to set up our Google Colab environment to work with R. Follow these steps:
Open Google Colab:
Sign in with your Google account if prompted
Create a new notebook:
Click on "New Notebook" or File > New Notebook
Rename your notebook:
Click on "Untitled0.ipynb" at the top and rename it to "R_AI_Data_Analysis_Lab"
Change the runtime to R:
Go to Runtime > Change runtime type
In the pop-up, select "R" from the dropdown menu
Click "Save"
Install necessary R packages:
In the first code cell, paste the following:
R
Copy
install.packages(c("tidyverse", "tidytext", "caret", "e1071", "syuzhet", "wordcloud"))
Run this cell by clicking the play button or pressing Shift+Enter
This may take a few minutes to complete
Load the installed packages:
In a new code cell, paste:
R
Copy
library(tidyverse)
library(tidytext)
library(caret)
library(e1071)
library(syuzhet)
library(wordcloud)
Run this cell
Verify R is working:
In a new cell, type and run:
R
Copy
print("R is ready for our AI and Data Analysis lab!")
Now that we have our R environment set up in Google Colab, we're ready to begin our lab on AI model building and data analysis. Each subsequent part of the lab can be pasted into new code cells and run sequentially.
Remember:
You can add new code cells by clicking the "+ Code" button or using the keyboard shortcut Ctrl+M B
To run a cell, click the play button next to it or use Shift+Enter
You can add text cells for notes by clicking "+ Text" or using Ctrl+M T
Let's begin our exploration of AI and data analysis with R!
[Proceed with the main lab content]
This preamble provides a clear, step-by-step guide for students to set up their R environment in Google Colab, ensuring everyone starts on the same page and can follow along with the lab smoothly.
```r # Title: R Programming Lab for AI Model Building and Data Analysis # Objective: Demonstrate text classification and sentiment analysis using R in Google Colab
# Part 1: Setup and Data Preparation # Install and load necessary packages install.packages(c("tidyverse", "tidytext", "caret", "e1071", "syuzhet")) library(tidyverse) library(tidytext) library(caret) library(e1071) library(syuzhet)
# Load dataset (IMDB Movie Reviews) url <- "https://raw.githubusercontent.com/jbrownlee/Datasets/master/review_polarity.tar.gz" download.file(url, destfile = "review_polarity.tar.gz") untar("review_polarity.tar.gz")
# Read positive and negative reviews pos_files <- list.files("txt_sentoken/pos", full.names = TRUE) neg_files <- list.files("txt_sentoken/neg", full.names = TRUE)
pos_reviews <- lapply(pos_files, read_lines) %>% unlist() neg_reviews <- lapply(neg_files, read_lines) %>% unlist()
# Create a dataframe reviews <- data.frame( text = c(pos_reviews, neg_reviews), sentiment = factor(c(rep("positive", length(pos_reviews)), rep("negative", length(neg_reviews)))) )
# Part 2: Text Preprocessing reviews_clean <- reviews %>% mutate(text = str_to_lower(text), text = str_replace_all(text, "[^[:alnum:]\\s]", ""), text = str_replace_all(text, "\\s+", " "), text = str_trim(text))
# Part 3: Feature Extraction reviews_tokens <- reviews_clean %>% unnest_tokens(word, text)
word_counts <- reviews_tokens %>% count(sentiment, word, sort = TRUE)
total_words <- word_counts %>% group_by(sentiment) %>% summarize(total = sum(n))
word_counts <- left_join(word_counts, total_words)
word_counts <- word_counts %>% bind_tf_idf(word, sentiment, n)
# Select top features top_features <- word_counts %>% arrange(desc(tf_idf)) %>% group_by(sentiment) %>% top_n(100, tf_idf) %>% ungroup() %>% select(word) %>% unique()
# Create document-term matrix reviews_dtm <- reviews_clean %>% unnest_tokens(word, text) %>% count(sentiment, word) %>% filter(word %in% top_features$word) %>% cast_dtm(sentiment, word, n)
# Part 4: Model Training and Evaluation set.seed(123) train_index <- createDataPartition(reviews$sentiment, p = 0.8, list = FALSE) train_data <- reviews_dtm[train_index, ] test_data <- reviews_dtm[-train_index, ]
train_labels <- reviews$sentiment[train_index] test_labels <- reviews$sentiment[-train_index]
# Train SVM model svm_model <- svm(x = train_data, y = train_labels, kernel = "linear")
# Make predictions predictions <- predict(svm_model, newdata = test_data)
# Evaluate model confusion_matrix <- confusionMatrix(predictions, test_labels) print(confusion_matrix)
# Part 5: Sentiment Analysis # Function to get sentiment scores get_sentiment_scores <- function(text) { scores <- get_nrc_sentiment(text) colSums(scores) }
# Apply sentiment analysis to reviews sentiment_scores <- reviews_clean %>% group_by(sentiment) %>% summarize(across(sentiment, ~ list(get_sentiment_scores(.x))))
# Visualize sentiment scores sentiment_scores %>% unnest(sentiment) %>% gather(emotion, score, -sentiment) %>% ggplot(aes(x = emotion, y = score, fill = sentiment)) + geom_bar(stat = "identity", position = "dodge") + theme_minimal() + labs(title = "Sentiment Analysis of Movie Reviews", x = "Emotion", y = "Score") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Part 6: Word Clouds for Visualization library(wordcloud)
# Function to create word cloud create_wordcloud <- function(text, title) { words <- text %>% unnest_tokens(word, text) %>% count(word, sort = TRUE) %>% filter(!word %in% stop_words$word) wordcloud(words = words$word, freq = words$n, max.words = 100, random.order = FALSE, rot.per = 0.35, colors = brewer.pal(8, "Dark2")) title(title) }
# Create word clouds for positive and negative reviews par(mfrow = c(1, 2)) create_wordcloud(reviews_clean$text[reviews_clean$sentiment == "positive"], "Positive Reviews") create_wordcloud(reviews_clean$text[reviews_clean$sentiment == "negative"], "Negative Reviews")
# Part 7: Conclusion and Next Steps cat(" Lab Conclusion: 1. We've built a text classification model using SVM to predict sentiment. 2. We've performed sentiment analysis to understand emotional content. 3. We've visualized the results using ggplot2 and word clouds.
Next Steps: 1. Experiment with other ML algorithms (e.g., Random Forest, Naive Bayes). 2. Try more advanced NLP techniques like word embeddings. 3. Scale up to larger datasets using distributed computing frameworks. ") ```
This lab covers several key aspects of AI model building and data analysis using R:
1. Data preparation and preprocessing 2. Feature extraction using TF-IDF 3. Machine learning model (SVM) for text classification 4. Model evaluation using confusion matrix 5. Sentiment analysis using the syuzhet package 6. Data visualization with ggplot2 and word clouds
The lab uses the IMDB movie review dataset, which is a common benchmark for sentiment analysis tasks. It demonstrates how to build a sentiment classifier and perform more in-depth sentiment analysis, aligning with the course objectives related to NLP and machine learning.
To run this lab in Google Colab:
1. Create a new notebook 2. In the first cell, enter: `%%R` to indicate that the cell should use the R kernel 3. Copy and paste the entire code into the cell 4. Run the cell
Note that the first run might take some time as it needs to install the necessary packages. Also, you might need to adjust the code if you encounter any memory constraints in Colab.
This lab provides a hands-on experience with R programming for AI and data analysis tasks, covering several topics mentioned in the course outline, particularly sections 7 (Big Data) and 8 (Data Science Tools for AI and ML).
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.