Polls Categorisation

Golden Test Set

A Golden Test Set refers to gold standard data. This refers to data of very high quality, which is more or less as close as you can get to the ground truth. Gold standard data is great for machine learning tasks, since it is known to be of high quality and is generally created using manual intervention.

For our use-case for poll categorisation task, we would be creating golden test set by introducing a new attribute in the polls dataset. Following is a snapshot of how polls dataset would look like:

Polls Dataset

Polls Dataset

ITEM_ID

Creating Golden Test Set

The below figure shows the pipeline to create Golden Test Sets and how we are going to use it to fine-tune our machine learning models, such that over time our models become extremely accurate in categorising polls to the the correct class.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.