This process assesses the Data Quality of a dataset so that it can be compared to the requirements for reuse..
It is easy to assume that data that is received from another source, is of high quality, for example, that it is accurate, and up to date, and so on. However, the source organisation may know that the data has some issues, for example, it may only be updated once a year, or perhaps it does not contain a full set of records.
Completeness - describes the degree to which records are present.
Uniqueness - describes the degree to which there is no duplication in records.
Consistency - describes the degree to which values in a data set do not contradict other values
Timeliness - describes the degree to which the data is an accurate reflection of the period that they represent, and that the data and its values are up to date.
Validity - describes the degree to which the data is in the range and format expected.
Accuracy - describes the degree to which data matches reality.
The provenance of data can be traced from its originating source, through to its current stage. This could include understanding
how the data was collected
transformations that have been applied
data cleansing processes to ensure that data continues to meet its intended quality