DHL Status

Explore

Resources

PRE-READ: Collaboration, Not Competition: ChatGPT & Caplena

This is a pre-read and for your eyes only, please do not distribute

Picture this: In the vibrant atmosphere at our recent conference, a question danced on everyone's lips:

How does Caplena compare to ChatGPT? Does Caplena use ChatGPT?

Our answer to this is a mix of "yes" and "no." At Caplena, we primarily use our custom AI model for core tasks and ChatGPT for summarization feature. Drawing a parallel between Caplena and ChatGPT in text analysis might seem like a riddle wrapped in a mystery, but it's more like contrasting a hammer with an entire workshop. Each comes with a unique prowess and purpose, aptly suiting distinct scenarios. But are they identical twins in the AI family? A resounding 'no' rings clear. Let's delve into the reasons why👇.

ChatGPT, along with its GPT4 version, is a powerful tool, providing human-like responses to a broad array of questions. Conversely,

Caplena⁠

, a feedback analysis platform, uses an AI model to categorize large amounts of text data into distinct topics. Naturally, the question arises of which tool more effectively analyzes customer feedback by accurately categorizing sentences into relevant topics.

To provide an honest answer, we embarked on an internal study spearheaded by our machine learning engineers and natural language processing (NLP) researchers. Initially, our focus for this study was solely on categorization, intentionally disregarding the obvious differentiating factors of Caplena, such as its user interface (UI) and the various input-output options it provides. Cutting through the marketing speak, what follows is a breakdown of the topic.

Task Definition: Caplena vs. ChatGPT in Topic Assignment

To effectively compare Caplena and ChatGPT, we first need to define the task we're evaluating. Our focus here is the process of topic assignment. Given a predefined set of topics, our goal is to determine their frequency of appearance in user feedback. To accomplish this, the AI needs to categorize each text into one or more relevant topics.

Fundamental Differences

Caplena the Topic Maestro

The fundamental difference between Caplena and ChatGPT in topic assignment is that Caplena takes a direct approach, focusing specifically on topic assignment as its primary task. In addition, it utilizes dedicated algorithms and techniques to categorize sentences into relevant topics based on predefined sets.

ChatGPT the Chitchat Enthusiast

On the other hand, ChatGPT operates in a chat-like format, designed for engaging and interactive conversations. While ChatGPT can understand and respond to various topics, its primary objective is not that of topic assignment. Instead, ChatGPT aims to generate human-like responses while indirectly addressing topics within the conversation.

To provide further clarity, we can utilize the analogy of a gardener versus a florist.

The Plot and Petals Parable

Caplena is like a skilled gardener who possesses an intimate understanding of their garden's ecosystem. They attentively cater to the specific needs of each flower, knowing precisely the quantity and variety of blooms present. Furthermore, they recognize that the vitality of the entire ecosystem hinges upon a small pond nestled within the garden.

In contrast, ChatGPT resembles a proficient florist who handpicks the most beautiful flowers to give you an overall impression of the garden's ambiance. However, the florist would not include every flower and lacks knowledge of the garden's underlying intricacies. Although they can inquire with the gardener to obtain specific details, they themselves do not possess that information.

Caplena UI for Topic Assignment

⁠

⁠

ChatGPT UI for Creating Conversational Responses

⁠

⁠

As evident from the above, when comparing the user interface of ChatGPT to Caplena, ChatGPT operates by receiving and generating conversational responses through a chat interface. Consequently, when it comes to the task of categorizing topics within sentences, the approach necessitates the construction of a prompt in the form of a conversation. Therefore, it is essential to crafting a comprehensive prompt rather than directly inputting the sentence for the topic assignment (as one would do using Caplena). Let's take a look at an example of how a prompt similar to this one might appear:

Providing a Prompt to ChatGPT

⁠

⁠

Now that we've established how both tools operate, let's delve into a more specific comparison.

Topic Assignment Accuracy: Caplena vs. ChatGPT

In our experiment, ChatGPT and Caplena went head-to-head, pitted against each other using a test dataset of 19 English surveys. We utilized the F1 score as our primary evaluation metric, gauging the model's accuracy from 0% to 100% based on its alignment with a human-annotated dataset. In addition, the analysis considers precision and recall, which helps handle the uneven class distribution.

Our evaluation revealed that, overall, Caplena's AI delivered superior accuracy. However, there were five instances where ChatGPT took the lead (illustrated by the five stars underneath the green line). Upon calculating the average F1 score across all 19 surveys, ChatGPT clocked in at 35%, while Caplena's AI scored a higher average of 47%. This difference is noteworthy despite a high degree of variance within the dataset. However, this is only the beginning of our comparison. Although the numbers may seem similar at first glance, Caplena takes an additional step that significantly increases the percentage from 47%.

Caplena vs ChatGPT Performance

⁠

Plot Caption: The chart below maps out the F1 scores for each of the 19 surveys, with ChatGPT scores plotted on the x-axis and Caplena AI scores on the y-axis. Surveys positioned above the green line indicate a superior analysis by Caplena AI, while those below the line fared better with ChatGPT.

Fine-Tuning: The Caplena Advantage

Caplena's standout feature lies in its remarkable fine-tuning capability, which enables a significant improvement from an initial F1 score of 47% to an ambitious target of 70% or higher. Achieving a score of 70% places Caplena's precision analysis on par with or surpasses human-level performance. Notably, a

study⁠

by Ishita, Oard, and Fleischmann found that fully manual human analysis only achieved a weighted F1 score of 62.7%, proving that humans are not infallible coders. Any score surpassing that threshold indicates a more accurate analysis than what humans can achieve. While an initial 47% score may suffice for simple label assignments, it faces challenges in handling complex scenarios like lengthy reviews or topics with semantic overlaps.

To overcome these challenges, we have implemented a streamlined process that actively involves users in two crucial steps:

Model Fine-Tuning: Users play a pivotal role in fine-tuning Caplena's understanding of specific text topic categorization. In this process, users manually validate model outputs for a small subset of the dataset. They manually review whether Caplena's categorization is correct or incorrect for a select number of test cases. This process of working with the AI usually takes from around 5 to 10 minutes. This effortless validation process improves the model's accuracy and ensures it aligns with the user's requirements.

Performance Evaluation using F1 Score: To maintain transparency in our performance metrics, a portion of the human-validated data from the process, as mentioned above, is used as a test set for estimating the F1 score. This evaluation method allows users to gauge the accuracy of the Caplena analysis. They can monitor the evolution of their F1 score in real time, providing insights into the model's performance. We recommend considering the F1 score of 70 to indicate satisfactory accuracy.

Post F1 Score Outcomes

Caplena made significant performance improvements through an interactive process in this specific study.

Caplena’s F1 score increased from 47% to 59% with just a few human inputs per survey. To ensure a fair comparison, we applied a similar fine-tuning process to ChatGPT using few-shot learning.

ChatGPT's F1 score decreased from 35% to 32% despite using the same data for fine-tuning. This indicates that this method instead yields a negative performance for ChatGPT.

The decision to halt the fine-tuning of Caplena at a 59% F1 score was made considering the constraints of the study. Refer to the next section for clarification.

Caplena AI After Fine-Tuning vs ChatGPT after fine-tuning

Plot Caption: The following chart compares the performance of Caplena AI after fine-tuning with that of ChatGPT. The performance gap between the two is now more pronounced.

⁠

Caplenas Fine-Tuning Process

Video Caption: While the fine-tuning process in Caplena involves some manual effort, the process has been streamlined to simplify the procedure. Compared to the arduous task of individually examining each text comment, Caplena's fine-tuning process can be completed within a few minutes. Moreover, this process enables you to surpass the capabilities of human analysis by enhancing the accuracy of your analysis to a level that exceeds what can be achieved through manual means.

⁠

Fine Tuning Screenrecording for GPT Post youtu.be⁠

⁠

Study Limitations and Conclusion

Our study encountered hurdles due to ChatGPT's prompt size constraint, limiting the number of manually verified samples we could utilize to get a higher F1 score. As a result, we had to halt at a 59% F1 score for Caplena (vs. 32% from ChatGPT) to maintain fairness in the experiment, despite Caplena being able to reach scores of 70 and beyond in real life applications. A key issue with ChatGPT is the length of its prompts. Despite GPT-4's ability to accommodate prompts of up to 20,000 words, it faltered when faced with the intricacy of this task, as the combined length of the prompt and response could only total a maximum of 3,000 words. Either way, in ChatGPT, fine-tuning appears to have a negative effect, while in Caplena it obviously has a positive one.

Despite the limitations mentioned above, Caplena's capabilities shine through, even when utilizing a limited number of fine-tuning reviews. Caplena offers practically unlimited reviews, allowing you to assess as many reviews as needed until achieving satisfaction with the AI score. Although our study yielded a score of 59%, Caplena's real-life performance surpasses 70% when presented with a more extensive set of meticulously labeled samples. In addition, the fine-tuning process itself is quick and intuitive, as shown below.

Also, there were instances where ChatGPT struggled to interpret our prompts correctly, resulting in responses like:

“Cannot classify the 35th review as it seems to be incomplete or unrelated to the topic list.”

“Little confused by this review, it doesn't seem to provide enough information to assign a topic. Can you provide more context or information to help me understand the review better?”

“1,2,3 all have [TOPIC]” (when multiple reviews are included in a prompt).

In sum: Caplena outperforms ChatGPT in precise text analysis tasks, giving it a clear advantage. Now, let's unpack other elements contributing to Caplena's unrivaled performance for the task of assigning topics to text.

A Quick Run-Through of Caplenas Workflow

Video Caption: Caplenas Workflow has been designed specifically for the text analysis workflow, collaboratively creating a codebook with the AI, fine-tuning and topic assignment, creating personalized visualizations, building interactive dashboards, and summarising outcomes using ChatGPT.

⁠

Fast run-through of Caplena for Chat GPT Blog youtu.be⁠

⁠

Latest Feature: Summary Generation with ChatGPT

While ChatGPT's strength lies in text summarization, it's important to recognize its limitations when analyzing quantitative customer feedback topics. Although the summarization feature of ChatGPT is impressive, it should not be considered a substitute for comprehensive and accurate textual analysis.

Recognizing the strengths of both tools, we've incorporated ChatGPT into Caplena's latest feature: Summary Generation.

CTA: Unleash the power of data with our latest Summary Feature. Caplena + a sprinkle of ChatGPT = Summary Superpowers 🪄! Contact us now and experience a bite-sized analysis that packs a punch. Get the juiciest insights with just one click! 🚀 📊

During this ongoing beta testing phase, we're effectively combining the best of both worlds. Caplena carries out the precise analysis, and ChatGPT provides a concise summary of the analyzed data. With this combination, you can get an overview of your data in just a few sentences - all at the click of a button, ensuring an efficient and user-friendly experience.

Caplenas New Summary Feature (in Beta Testing)

⁠

Summary Feature youtu.be⁠

⁠

Conclusion

To wrap up, Caplena stands out as the superior choice for customer feedback analysis over ChatGPT for several key reasons:

Zero-shot capability: Caplena outperforms ChatGPT due to custom training data.

Fine-tuning & quality assurance: With Caplena, you have the ability to fine-tune and greatly improve performance to an above average human analysis. ChatGPT does not have this option and even if it did, it would yield negative results instead.

User friendly UI: Caplena's user interface, which is optimized for textual analysis workflows, provides a more user-friendly experience compared to ChatGPT's chat-based interface.

Cost-effectiveness: For large volumes of data and multiple runs with different themes, ChatGPT could become considerably costly, making Caplena a more economical choice.

Privacy/Compliance: For our European customers, it's currently not an option to send data to the US, making Caplena a safer choice from a data privacy perspective.

Although ChatGPT is a groundbreaking AI tool with remarkable capabilities in generating human-like responses and providing contextual information, it's crucial to understand its strengths and limitations.

In the specific task of determining topics and their frequency of appearance in user feedback, Caplena clearly excelled. Caplena achieved an impressive overall score of 59%, surpassing ChatGPT's modest 32% F1 score. Without the limitations imposed by this study, Caplena has the potential to achieve scores well beyond 70%, reaching above average human-level accuracy. This is not surprising, as Caplena has been purposefully designed for this specific task, while ChatGPT serves a different use case.

Having said that, Caplena and ChatGPT are not in opposition; they complement each other. By integrating ChatGPT's summarization feature into Caplena's dashboard, Caplena users can now harness the strengths of both tools to gain valuable insights from their data through powerful text analytics and visualization tools.

By capitalizing on the strengths of both ChatGPT and Caplena, we can achieve new levels of efficiency and accuracy in text analytics. This collaboration bridges the gap between sophisticated language models and advanced text analysis, unlocking new opportunities and expanding the boundaries of what can be achieved in this field.

So, is it really an "us vs them" question? No. It's an "us and them" opportunity for innovation and progress in the realm of text analytics.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.