Skip to content
Hunch UGC: ML Services & Solutions

icon picker
Polls Categorisation

Golden Test Set

A Golden Test Set refers to gold standard data. This refers to data of very high quality, which is more or less as close as you can get to the ground truth. Gold standard data is great for machine learning tasks, since it is known to be of high quality and is generally created using manual intervention.
For our use-case for poll categorisation task, we would be creating golden test set by introducing a new attribute in the polls dataset. Following is a snapshot of how polls dataset would look like:
Polls Dataset
ITEM_ID
CATEGORY
CREATED_BY
CREATION_TIMESTAMP
END_TIMESTAMP
LLAMA_OUTPUT
IS_GOLDEN
1
0026c476-8ebc-4178-9a56-fc78f594bac5
Lifestyle||All
1672578577
1673029800
{
"category": 'Music|All',
"confidence_score": 0.4
}
{
"value": True,
"category": "Lifestyle|All"
}
2
0026c476-8ebc-4178-9a56-fc78f594bac5
Lifestyle||All
1672578577
1673029800
{
"category": 'Lifestyle|All',
"confidence_score": 0.9
}
{
"value": False,
"category": "Lifestyle|All"
}
3
0026c476-8ebc-4178-9a56-fc78f594bac5
Lifestyle||All
1672578577
1673029800
{
"category": 'Lifestyle|All',
"confidence_score": 0.3
}
{
"value": True,
"category": "Lifestyle|All"
}
There are no rows in this table

Creating Golden Test Set

The below figure shows the pipeline to create Golden Test Sets and how we are going to use it to fine-tune our machine learning models, such that over time our models become extremely accurate in categorising polls to the the correct class.

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.