The Person_ID Handbook
Summary and outline
The Person_ID is a unique patient identifier used by NHS England with the objective of standardising the approach to patient-level data linkage across different data sets.
This handbook aims to provide users of the Person_ID in the Hospital Episode Statistics (HES) databases with supporting documentation on what the Person_ID is, how it is derived via the Master Person Service (MPS), how the data flows between services (Data Processing Services (DPS) and Spine), and how to interpret the output information associated with the Person_ID.
Person_IDs are provided in many data sets available in NHS England including HES, and are derived from the outputs of MPS. For security and privacy reasons many users might have visibility of the tokenised version of the Person_ID, which provides an extra level of patient confidentiality.
MPS takes certain demographic information contained in a person’s health and care records and matches it to their unique NHS number to confirm their identity. The collection of all NHS numbers and patients’ demographic information is contained in the Personal Demographics Service (PDS) data set.
Like any data linkage method, MPS can provide non-perfect matching. There are risks of both failing to match a record (false negative) and matching to a record incorrectly (false positive). The performance of MPS is determined by both the algorithm itself and the quality of incoming data.
MPS operates in the same way for all data sets and is not tuned to any particular use case. For example, where records reliably have accurate NHS numbers attached, MPS will provide a correct match with high confidence. Where solely relying on other personal identifiers (such as name, postcode, gender or date of birth), which may be incomplete, inconsistently recorded or duplicated across the population, the algorithm will be less able to return a correct match in all cases.
Mature health datasets, where identity is typically validated in a healthcare setting at point of recording (such as HES), have higher levels of matching accuracy through MPS for most records. Performance for other datasets may be variable.
Where a perfect match of NHS number and date of birth cannot be found between a record of interest and any of the PDS records, more complex algorithms are used to compare partial demographic information to identify the most likely PDS record corresponding to the query record. These algorithms are referred to as alphanumeric and algorithmic trace, but in HES only the latter is used. In the algorithmic trace step, the single queried record is compared to all records in PDS. The comparisons involve some demographic information (date of birth, name, gender and postcode) and are scored based on similarity. If the similarity is deemed acceptable, the matched record is returned. Otherwise, the algorithm proceeds to look for similarities between the record of interest and some previously unmatched records, stored in the MPS record bucket, a separate data set.
The Person_ID is therefore one of NHS number from PDS, MPS_ID from the MPS record bucket or a one-time-use ID, depending on if and where a match was found.
The rest of the document is structured as follows:
- chapter 1 explains what the Person_ID is and provides details on the scope of this document
- chapter 2 explains how the Person_ID is generated and how it is used in the context of the HES data set
- chapter 3 provides a more detailed technical explanation of the algorithms behind the matching logic
- chapter 4 shows specific empirical examples of how a Person_ID is matched
- finally, chapter 5 contains additional information helpful to the Person_ID users
Last edited: 27 February 2024 3:54 pm