This document details what we want to achieve during Phase 2 with our Hunch Recommender Systems Engine. The RecSys Engine decides the User Feed of the Hunch App.
Phase 1: Recap
Via an iterative approach of model development based on AWS Personalize metrics analysis and via our own statistical analysis of actual polls recommendations served to our users, we developed a combination of 3 AWS Personalize models listed below:
The User-Personalization (aws-user-personalization) model recipe is optimised for all personalized recommendation scenarios. It predicts the polls that a user will interact with based on prior interactions history, polls, and users datasets. When recommending items, it uses automatic item exploration. Trending-Now Model
The Trending-Now model recipe (aws-trending-now) generates recommendations for polls that are rapidly becoming more popular with our users. Trending-Now recipe allows recommending polls gaining in popularity that more relevant to your customers. For example, our customers will likely value what other users are interacting with. For our use-case, this would mean recommending viral polls in the user feed. Popularity-Count Model
Popularity-Count recommends the most popular items based on our user interactions data. The most popular items are the items with the most interactions data from unique users. The recipe returns the same popular items for all users.
Choosing this model recipe in the mix with our other 2 models was based on the fact that our User-Personalization generally recommends LESS HUNCHED/COMMENTED polls on its own for the user feed.
Mixing Recommendations From 3 Model Recipes: Strategy
Following is the strategy we have used to design the user feed based on polls recommended from the 3 model recipes:
Pick the top 3 recommended polls from User-Personalization Model Recipe Pick the top-1 recommended poll from Trending-Now Model Recipe Pick the top-1 recommended poll from the Popularity-Count Model Recipe Concatenate the results from points 1-3 to get a top-5 recommendation polls list. Remove duplicates from the top-5 recommended polls Repeat from point 1 to get the next top-5 polls to be recommended.
We have further divided Phase 2 into different stages which lists out the Rec-Sys Engine services, features and solutions we plan to develop stage by stage, engineer and deploy to cater to the use-cases of Hunch.
The implementation of Phase 2 will happen from to
Stage 1: Implementing AWS Impressions Feed
For model recipe, Amazon Personalize can model impressions data that you upload to an Interactions dataset. Impressions are lists of items that were visible to a user when they interacted with (for example, clicked or watched) a particular item.
Amazon Personalize uses impressions data to determine what items to include in exploration. Exploration is where recommendations include new items with less interactions data or relevance. The more frequently an item occurs in impressions data, the less likely it is that Amazon Personalize includes the item in exploration.
Stage 2: Promoting items in recommendations
We can specify a promotion when users get recommendations. A promotion defines additional business rules that apply to a configurable subset of recommended items. For example, we might have a promotional event where we want to promote a subset of recommended polls but also recommend relevant polls. We could use a promotion to specify that a certain percentage of recommended items must come from the category PROMOTIONAL_EVENT. The remaining recommended items would continue to be relevant recommendations based on your recipe and any request filters.
Stage 3: Impact of Follower-Following on Hunch Feed
Clustering users based on follower-following data can be approached as a network analysis problem. In this case, each user can be considered as a node in a network, and the follower-following relationships between users can be represented as edges connecting the nodes.
Here's a general approach to clustering users based on follower-following data:
Data Preparation: Obtain the follower-following data for the users you want to cluster. This data typically includes information about who follows whom and who is followed by whom. Network Construction: Build a network graph using the follower-following data. Each user is represented as a node, and the follower-following relationships are represented as directed edges connecting the nodes. You can use libraries like NetworkX (in Python) or Gephi (a standalone network analysis tool) to create and analyze the network. Feature Extraction: Extract relevant features from the network to characterize each user. Some common features include degree centrality (number of connections), betweenness centrality (importance as a bridge between other users), and clustering coefficient (how densely connected a user's connections are). Choose Clustering Algorithm: Select a clustering algorithm that works well with network data. Some popular choices include k-means clustering, spectral clustering, and community detection algorithms like Louvain or Girvan-Newman. Apply Clustering Algorithm: Apply the chosen clustering algorithm to the extracted features or directly on the network data. The algorithm will group users into clusters based on their similarity or connectivity. Evaluate and Interpret Clusters: Evaluate the quality of the obtained clusters using appropriate metrics such as silhouette score or modularity index. Interpret the clusters by examining the characteristics of users within each cluster, such as their interests, engagement patterns, or any other relevant attributes. Refinement and Iteration: Depending on the results, you may need to refine the feature extraction or clustering process, try different algorithms, or adjust parameters to improve the clustering results.
We want to incorporate the information about users following other users as User Metadata within our AWS Personalization Model Recipe. Since AWS Personalize does not provide an in-built way of incorporating this information, we would be using K-Means clustering technique to represent each user by the cluster number they are assigned to. Further, we would use this user cluster representation as CATEGORICAL Users metadata which we then input to the AWS Personalize Model.
Lastly, we would observe the impact of incorporating follower-following information on User Feed via statistical analysis to have a proof of concept that our implementation does increase the probability of polls being recommended from followings of a specific user.
Stage 4: Incorporating Unstructured Textual Data as Polls Metadata
With the following modeling recipes, Amazon Personalize can extract meaningful information from unstructured text metadata, such as Poll Question, Polls Options, Poll Description, Options Descriptions (if Options are image) and Comments on the poll
Stage 5: Hyperparameter Tuning: Trending-Now & Popularity-Count Model Recipes
Specify how often AWS Personalize evaluates our interactions data and identifies trending polls. For example, if you specify 30 min for Trend discovery frequency, every 30 minutes AWS Personalize identifies polls with the greatest rate of increase in interactions over 30-minute intervals.
Available frequencies include 30 minutes, 1 hour, 3 hours, and 1 day. Choose a frequency that aligns with the distribution of your interactions data. Missing data over the interval you choose can reduce recommendation accuracy.
There are no hyperparameters to tune for the modeling recipe.