Golden Test Set
A Golden Test Set refers to gold standard data. This refers to data of very high quality, which is more or less as close as you can get to the ground truth. Gold standard data is great for machine learning tasks, since it is known to be of high quality and is generally created using manual intervention.
For our use-case for poll categorisation task, we would be creating golden test set by introducing a new attribute in the polls dataset. Following is a snapshot of how polls dataset would look like:
Creating Golden Test Set
The below figure shows the pipeline to create Golden Test Sets and how we are going to use it to fine-tune our machine learning models, such that over time our models become extremely accurate in categorising polls to the the correct class.