Matching of personal data (i.e., combining, comparing or matching personal data obtained from multiple sources) is a sensitive activity covered by various provisions in both the UK GDPR and the Data Protection Act (DPA) 2018.
Automated processing of personal data to establish a person’s location falls within the definition of profiling under GDPR Article 4(4). The ICO therefore regards data matching as being likely to result in high risk and requires completion of a Data Protection Impact Assessment (DPIA) for data matching activities. Public sector data matching is covered by the
“the data that is adequate, relevant and limited to what is necessary to undertake the matching exercise, to enable individuals to be identified accurately.”
To reduce the potential risks and develop matching algorithms that achieve accurate identification of individuals, the
will need a good understanding of the contents of shared datasets, i.e., the data quality and provenance.
Given the ambiguities that can occur in matching four-line addresses and the fact that postcodes can cover tens of properties, wherever practicable it is recommended that property address matching is based on UPRNs.
There is a good case to include UPRNs in back-office datasets that may contain
so that they can be matched in a
The matching of
is complicated by the manner in which names are used and stored. There can be significant differences between the names in some official records, e.g., birth certificates, passports, etc., and the names people are called or use on a day-to-day basis. For example, a person with the forename Elizabeth may be known as wish to be called Liz, Lizzie, Lizzy, Betty, Beth, etc. or something completely different, and names can have alternative spellings, e.g., Elisabeth.
A variety of methods exist for matching the names of individuals, with varying levels of accuracy and therefore risk of incorrect matches.
has been published and
can be more generally applied.
It is recommended that data matching is based initially on
to identify potentially related personal records and that the matching is refined based on
by using a combination of title, surname, forename and date of birth. Where the address and these four fields are exact matches the likelihood of errors is very low. Where ambiguity remains across datasets for a @Residence, manual intervention to review the mismatch may be necessary before commencing further processing of the records.