Skip to content
Personal Project

icon picker
Social Media Analytics

Help you understand and analyze social media

What is this?

image.png
This project used Optical Character Recognition and Topic Modeling to develop a social media analytics tool for text-based Instagram content.
The resulting tool successfully provided relevant topic references and ideation processes for Instagram content, with good results obtained from testing and usability studies.

Results

image.png
image.png
An SMA using OCR and Topic Modeling has been successfully developed using the Python programming language, which first needs to collect content from the target account which is then extracted from the text. The SMA produces three Classes that accommodate the processes that can be performed separately, namely Instascrapper, Clean, and Topic Modeling.
Unlike topic modeling in other studies, the SMA carried out on Instagram requires manual cleaning because the content often contains words that can disrupt the performance of topic modelling such as brand names or words that cannot be extracted by OCR.
The results of OCR performance testing obtained good results with a score of 83%. OCR managed to extract 210 words out of a total of 255 words, although it has not yet fully succeeded in extracting existing brands and creating combinations of characters that have no meaning.
The results of understanding the results of Topic Modelling and Visualization as well as the usefulness of SMA obtained from three participants obtained good results with a score of 93% and the participants were able to find the right conclusions about the topics of the target set. This indicates that SMA has succeeded in providing relevant topic references and ideation processes for Instagram text-based content.
The acceptance test results obtained from three participants obtained a satisfactory score of 85%, and the lowest score came from the participants' confidence in the developed SMA, so they needed help from others if problems arose. This is because the participants did not have a programming background before.
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.