Part of Health and social care Cloud Risk Framework
Dimensions that affect risk
The impact of risk is considered along three dimensions:
- Data type
- Data scale
- Data persistence
Data type
The type of data being processed impacts risk. At the extremes: from managing reports intended for public distribution, to maintaining extremely sensitive PKI secrets. To support the range of potential types of health and social care data, we describe a health and social care classification scheme. To take advantage of cross-governmental principles around data classification, we also provide a mapping to the Government Security Classifications policy.
The Government Security Classifications Policy came into force in April 2014, providing a policy that describes the classification of information assets into one of three high-level types, and provides a baseline set of security controls for each. It is intended to be used for all information assets across government departments, agencies, public sector delivery partners and the wider supply chain. The three high-level types are: OFFICIAL, SECRET and TOP-SECRET, with OFFICIAL having an additional handling caveat of OFFICIAL-SENSITIVE. Additional descriptors are possible to further classify assets. A few descriptors are provided as core (of which the main relevant to us are COMMERCIAL, PERSONAL), although it is permissible to introduce others, supported by local policies and business processes.
In addition, existing classification schemes are mostly concerned with securing various information assets, whereas we also need to consider the distinction along different axes: for example, how data can and should be shared and the legal bases for its processing. Public perception and potential concern is also heightened in the health sector, which needs to be taken into account when defining the approach to, and controls applied to the handling of health data assets. Statements as to how healthcare information is processed also exist, either as department policy (such as DH offshore processing policy), NHS England processes and practices (such as how Spine 2 is operated), or as part of existing commercial arrangements (both national, such as GP IT Futures, or local, such as trusts’ supplier contracts).
The new scheme described in this paper provides a health and social care sector-specific framework upon which an appropriate and proportionate set of security controls can be applied, dependent on the specific needs of different kinds of health and social care data. It is required because existing data classification schemes do not achieve the level of granularity required to cover the variety of different data types that are processed across the health and care system, and there are specific needs and complexities in the processing of health data.
Note that additional controls can be added to any data-type in order to reduce any associated risks. For example, to address concerns regarding the Confidentiality, and integrity of data, such data may be separately encrypted before transfer to the cloud, using strong cryptography as defined by the current version of NIST SP800-57 and where the encryption keys are not stored with the cloud provider. In such circumstances the risk profile associated with the data being processed on public cloud is significantly mitigated.
The proposed data classification scheme is illustrated in the table below.
Type | Sub-type | Description | Example |
---|---|---|---|
Publicly available information | Statistical material that is intended for public distribution. Identification from these materials, with or without any other materials, is not feasible. | The number of diabetics in Sheffield, or location information for health-care providers. | |
Synthetic (test) data | Synthetic (test) data is fictional data, engineered to be representative of real data, that is created in order to avoid the need to use real data when developing and testing IT systems. Synthetic data must pose zero risk of contributing to the revealing of any personal data. | Fabricated dummy Hospital Episode Statistics (HES) data set, used for testing purposes, risk assessed to ensure that there is no risk of the data contributing to the access to any personal data. | |
Aggregate data | Summarised and anonymised data, but which is not suitable for public distribution, for example due to the risk that it may be used with other material to contribute to the re-identification of individuals. The risk of such re-identification is not necessarily significant but does exist (especially in the presence of a sustained and skilled attack). | Summarised records of activity of a particular hospital. | |
Already encrypted materials | Materials that are already encrypted before they touch the cloud, using strong cryptography as defined by the current version of NIST SP800-57 and where the encryption keys are not stored with the cloud provider. | Scanned hospital patient notes which are encrypted by an application before being uploaded to the cloud for archive purposes. | |
Personal data (PID) | Information about an identified individual | ||
Demographic data | Information about the individual rather than their clinical details. | A person’s address details and NHS Number. | |
High risk demographic data | Demographic data where, in the event of a breach, there is a high risk of significant harm. | The address details of a person under the care of the UK Protected Persons Service, likely to be reflected in an S-flag applied to their PDS details. | |
Personal confidential data (PCD) | PCD is based on the ICO definition of sensitive personal data, extended within health and social care to include deceased persons and information that is given in confidence and is owed a duty of care, such as:
|
||
Legally-restricted PCD | Sensitive personal data that are subject to additional regulations or statute, under either the: | Details of a person’s previous gender. | |
Extra-delicate PCD | Sensitive personal data that are sometimes seen to be additionally delicate, but for which there are no legal restrictions. This determination is often not consistent, but is commonly held, and is often related to conditions that attract, or are considered to attract, stigma. For example, HIV status, mental health conditions, other conditions contained within the SCR 'sensitive code' list. Whilst many patients see information on these kinds of condition to be particularly private and not to be shared under any circumstances, others see them as important to share, and for any stigmas to be removed. Note that there is no legal distinction between PCD and extra-delicate PCD. | Details that a person has asked not to be shared. | |
Anonymised data | Sensitive personal data that has been subject to de-identification and/or other privacy-enhancing techniques, in line with the ICO Anonymisation Code of Practice. Risk of re-identification is remote (and would be based on activities that are illegal and/or break contractual arrangements). No way of authorised linking with other data-sets. | Extract from a research database where all pseudonyms have been removed. | |
Pseudonymised data | Sensitive personal data that has been subject to de-identification and/or other privacy-enhancing techniques, in line with the ICO Anonymisation Code of Practice, containing a pseudonym that allows for linking with other data-sets where that is permitted through business justification and legal basis. Otherwise, risk of unauthorised re-identification is remote (and would be based on activities that are illegal and/or break contractual arrangements). | HES data set. | |
Reversibly pseudonymised data | Pseudonymised data where the pseudonym is also intended to be used to facilitate re-identification where that is supported by business purpose and legal basis. | Data dissemination to support risk stratification (where individuals may subsequently be usefully re-identified to support their direct care). | |
Irreversibly pseudonymised data | Pseudonymised data where re-identification is not intended. | Data dissemination to support a research project that never requires re-identification. | |
Patient account data | Account credentials (including any recovery materials) for citizen accounts for patient-facing online health tools. | A person’s account details for the NHS.UK website. | |
Patient choices | Statements/preferences made by patients regarding the use of their data. | A person’s expressions of their wishes recorded in their GP’s clinical system or on the Spine. | |
Patient meta-data (identifiable) | Information about how identified patients have used patient-facing online health tools. | History of an identified person’s use of the NHS.UK website's symptom information. | |
Patient meta-data (linkable) | Information about how patients have used patient-facing online health tools (not identified, but linkable across sessions). | History of an unknown (but linkable) person’s use of the NHS.UK website's symptom information. | |
Professional user account data |
Account credentials (including any recovery materials) for professional user (such as a clinician) accounts that control access to any personal data (including PCD). | A clinical application logon. | |
Professional account data (less-sensitive) |
Account credentials (including any recovery materials) for professional user (such as a clinician) accounts that control access to anonymised information. | Authentication details to portal providing access to anonymised data. | |
Audit data | Data that records the use of a system and the provenance of the data that system manages | Clinical system audit trail | |
Professional user meta-data | Information about how users have used clinical or administrative tools that process personal data. | History of a GP’s use of their clinical system, or of summary care record (SCR) | |
Audit data (personal) | Data describing the use of a clinical or administrative system that processes personal data, where that audit data itself includes or references PCD. | The audit trail of a GP system showing all users’ interactions and use of the system. | |
Audit data (non-personal) | Data describing the use of a clinical or administrative system, where that audit data itself does not include or reference PCD. | History of logins to a clinical system. | |
Key materials | Material that provides long-lived linkage between reversibly-pseudonymised data and personal data, or provides a similarly significant security function. | Look-up tables or decryption keys. | |
Very short-lived | One-time decryption keys | A decryption key generated to support (and only usable within) a specific re-identification activity within an individual user session. | |
Rotatable | Material that provide linkage between reversibly-pseudonymised data and personal data, that persists over time and over user sessions but is generally rotatable. | An encryption key used by a DSCRO to re-identify pseudonyms included in many data disseminations. | |
Long-lived, persistent | Material that provide long-lived and persistent linkage between reversibly-pseudonymised data and personal data, or provides a significant security function. | A root certificate private key for a widespread PKI. |
The table below provides an agreed2 mapping between the health data types and the Government Security Classification Policy. This enables us to take advantage of cross-government policy statements and published principles (such as the 14 NCSC Cloud security principles) around that classification, whilst treating those statements as necessary but not necessarily sufficient in a health and social care context.
2 Agreed by the Healthcare Cloud Working Group, including NHS England, Department of Health and Social Care, Government Digital Service.
Type | Sub-type | Map to govt. security classification | Notes |
---|---|---|---|
Publicly available information | No application mapping | The most obvious mapping is to something like UNCLASSIFIED but this is no longer part of the model. | |
Synthetic (test) data | OFFICIAL | ||
Aggregate data | OFFICIAL | ||
Already encrypted materials | OFFICIAL | ||
Personal Data (PID) | OFFICIAL-SENSITIVE | ||
Demographic data | OFFICIAL-SENSITIVE | ||
High risk demographic data | OFFICIAL-SENSITIVE | ||
Personal confidential data (PCD) | OFFICIAL-SENSITIVE | ||
Legally restricted PCD | OFFICIAL-SENSITIVE | ||
Extra-delicate PCD | OFFICIAL-SENSITIVE | ||
Anonymised data | OFFICIAL-SENSITIVE | ||
Pseudonymised data | Maximum of variants | ||
Reversibly pseudonymised data | OFFICIAL-SENSITIVE | ||
Irreversibly pseudonymised data | OFFICIAL-SENSITIVE | ||
Patient account data | OFFICIAL-SENSITIVE | ||
Patient choices | OFFICIAL-SENSITIVE | ||
Patient meta-data (identifiable) | OFFICIAL-SENSITIVE | ||
Patient meta-data (linkable) | OFFICIAL-SENSITIVE | ||
Professional user account data | OFFICIAL-SENSITIVE | ||
Professional user account data (less-sensitive) | OFFICIAL-SENSITIVE | ||
Audit data | Maximum of variants | ||
Professional user meta-data | OFFICIAL-SENSITIVE | ||
Audit data (personal) | OFFICIAL-SENSITIVE | ||
Audit data (non-personal) | OFFICIAL-SENSITIVE | ||
Key materials | Maximum of variants | ||
Very short-lived | OFFICIAL-SENSITIVE | ||
Rotatable | OFFICIAL-SENSITIVE | Whilst we need such data to be treated to the highest standards, they do not fit into the government policy criteria for SECRET or TOP-SECRET. | |
Long-lived, persistent | OFFICIAL-SENSITIVE | Whilst we need such data to be treated to the highest standards, they do not fit into the government policy criteria for SECRET or TOP-SECRET. |
Whilst we can (mostly) demonstrate an appropriate mapping from health data type to the Government Security Classification Policy, there are some limitations that emerge.
Many data types map to OFFICIAL-SENSITIVE, but there are many kinds of data in this category that we will control, and disseminate, in different ways.
We cannot, through the Government Security Classification Policy, indicate the very highly sensitive NHS materials such as PKI secrets as needing any greater control than many other kinds of information.
Data scale
There are two dimensions when considering scale: taking account of the depth (such as the scope of data for any one individual) and the breadth (how many individuals are included).
For depth, data should be treated the same whether there is a single data item that causes a particular classification to apply, or whether there are many.
For breadth, the scale is:
Scale | Description | Example scale |
---|---|---|
Extra small (XS) | Very low volume | Less than 10,000 records or events |
Small (S) | Local scale, such as an individual trust | Between 10,000 and 1m records or events |
Medium (M) | Regional scale, such as county or ACO | Between 1m and 5m records or events |
Large (L) | National scale | Over 5m records or events |
Note that the scale in question here is the number of patients/events, and how that is analogous to geographic indicators of scale, not the specific geographic spread of a particular data set: for example, a data set covering 1000 individuals spread across the country represents less risk than a data set covering 1 million individuals in a specific city.
This approach recognises the difference in potential harm given the scale of breach; it provides a wider recognition of very large datasets that are commonly processed across the health system (both inside and outside of NHS England). However, it is recognised that this banding is still somewhat artificial, requiring a degree of judgement.
Data persistence
A public cloud facility can be used to process data in many ways, ranging from, at one extreme, processing that requires long-term persistence of data, to the opposite extreme where data may be purely transient (is never persisted). The range of levels that are used is:
Persistency | Description | Example |
---|---|---|
Persistent | Data is deliberately placed into persistent physical storage (for example using databases or file stores) for long term/indefinite use. | Clinical System holding long-lived patient clinical information |
Temporary | Data is deliberately placed into persistent physical storage (for example using databases or file stores) for a short-defined period, typically for a specific project. | Dissemination environment providing access to national pseudonymised data to support a specific research project. |
Cached | Data may be persisted into persistent physical storage as part of the required processing but it is kept only to support time-bound transactions, rather than long-term. | Message queue. |
Transient | Data transits the facility but is never intentionally persisted out-of-memory. | Web interface capturing data that is immediately transferred outside of public cloud. |
Transient data is not risk-free: rather, different risks exist depending on the level of data persistence. In general, the level of overall risk reduces between persistent and transient.
Last edited: 8 January 2025 11:37 am