Dimensions that affect risk

The impact of risk is considered along three dimensions:

Data type
Data scale
Data persistence

Data type

The type of data being processed impacts risk. At the extremes: from managing reports intended for public distribution, to maintaining extremely sensitive PKI secrets. To support the range of potential types of health and social care data, we describe a health and social care classification scheme. To take advantage of cross-governmental principles around data classification, we also provide a mapping to the Government Security Classifications policy.

The Government Security Classifications Policy came into force in April 2014, providing a policy that describes the classification of information assets into one of three high-level types, and provides a baseline set of security controls for each. It is intended to be used for all information assets across government departments, agencies, public sector delivery partners and the wider supply chain. The three high-level types are: OFFICIAL, SECRET and TOP-SECRET, with OFFICIAL having an additional handling caveat of OFFICIAL-SENSITIVE. Additional descriptors are possible to further classify assets. A few descriptors are provided as core (of which the main relevant to us are COMMERCIAL, PERSONAL), although it is permissible to introduce others, supported by local policies and business processes.

In addition, existing classification schemes are mostly concerned with securing various information assets, whereas we also need to consider the distinction along different axes: for example, how data can and should be shared and the legal bases for its processing. Public perception and potential concern is also heightened in the health sector, which needs to be taken into account when defining the approach to, and controls applied to the handling of health data assets. Statements as to how healthcare information is processed also exist, either as department policy (such as DH offshore processing policy), NHS England processes and practices (such as how Spine 2 is operated), or as part of existing commercial arrangements (both national, such as GP IT Futures, or local, such as trusts’ supplier contracts).

The new scheme described in this paper provides a health and social care sector-specific framework upon which an appropriate and proportionate set of security controls can be applied, dependent on the specific needs of different kinds of health and social care data. It is required because existing data classification schemes do not achieve the level of granularity required to cover the variety of different data types that are processed across the health and care system, and there are specific needs and complexities in the processing of health data.

Note that additional controls can be added to any data-type in order to reduce any associated risks. For example, to address concerns regarding the Confidentiality, and integrity of data, such data may be separately encrypted before transfer to the cloud, using strong cryptography as defined by the current version of NIST SP800-57 and where the encryption keys are not stored with the cloud provider. In such circumstances the risk profile associated with the data being processed on public cloud is significantly mitigated.

The proposed data classification scheme is illustrated in the table below.

Type	Sub-type	Description	Example
Publicly available information		Statistical material that is intended for public distribution. Identification from these materials, with or without any other materials, is not feasible.	The number of diabetics in Sheffield, or location information for health-care providers.
Synthetic (test) data		Synthetic (test) data is fictional data, engineered to be representative of real data, that is created in order to avoid the need to use real data when developing and testing IT systems. Synthetic data must pose zero risk of contributing to the revealing of any personal data.	Fabricated dummy Hospital Episode Statistics (HES) data set, used for testing purposes, risk assessed to ensure that there is no risk of the data contributing to the access to any personal data.
Aggregate data		Summarised and anonymised data, but which is not suitable for public distribution, for example due to the risk that it may be used with other material to contribute to the re-identification of individuals. The risk of such re-identification is not necessarily significant but does exist (especially in the presence of a sustained and skilled attack).	Summarised records of activity of a particular hospital.
Already encrypted materials		Materials that are already encrypted before they touch the cloud, using strong cryptography as defined by the current version of NIST SP800-57 and where the encryption keys are not stored with the cloud provider.	Scanned hospital patient notes which are encrypted by an application before being uploaded to the cloud for archive purposes.
Personal data (PID)		Information about an identified individual
	Demographic data	Information about the individual rather than their clinical details.	A person’s address details and NHS Number.
	High risk demographic data	Demographic data where, in the event of a breach, there is a high risk of significant harm.	The address details of a person under the care of the UK Protected Persons Service, likely to be reflected in an S-flag applied to their PDS details.
	Personal confidential data (PCD)	PCD is based on the ICO definition of sensitive personal data, extended within health and social care to include deceased persons and information that is given in confidence and is owed a duty of care, such as: social care records/child protection / housing assessments DNA/finger prints bank/financial/ credit card details National Insurance number/tax, benefit or pension records travel details (for example at immigration control, or Oyster records) passport number/information on immigration status/travel records work record or place of work/school attendance/records
	Legally-restricted PCD	Sensitive personal data that are subject to additional regulations or statute, under either the: Gender Recognition Act 2004 Human Fertilisation & Embryology Act 2008	Details of a person’s previous gender.
	Extra-delicate PCD	Sensitive personal data that are sometimes seen to be additionally delicate, but for which there are no legal restrictions. This determination is often not consistent, but is commonly held, and is often related to conditions that attract, or are considered to attract, stigma. For example, HIV status, mental health conditions, other conditions contained within the SCR 'sensitive code' list. Whilst many patients see information on these kinds of condition to be particularly private and not to be shared under any circumstances, others see them as important to share, and for any stigmas to be removed. Note that there is no legal distinction between PCD and extra-delicate PCD.	Details that a person has asked not to be shared.
Anonymised data		Sensitive personal data that has been subject to de-identification and/or other privacy-enhancing techniques, in line with the ICO Anonymisation Code of Practice. Risk of re-identification is remote (and would be based on activities that are illegal and/or break contractual arrangements). No way of authorised linking with other data-sets.	Extract from a research database where all pseudonyms have been removed.
Pseudonymised data		Sensitive personal data that has been subject to de-identification and/or other privacy-enhancing techniques, in line with the ICO Anonymisation Code of Practice, containing a pseudonym that allows for linking with other data-sets where that is permitted through business justification and legal basis. Otherwise, risk of unauthorised re-identification is remote (and would be based on activities that are illegal and/or break contractual arrangements).	HES data set.
	Reversibly pseudonymised data	Pseudonymised data where the pseudonym is also intended to be used to facilitate re-identification where that is supported by business purpose and legal basis.	Data dissemination to support risk stratification (where individuals may subsequently be usefully re-identified to support their direct care).
	Irreversibly pseudonymised data	Pseudonymised data where re-identification is not intended.	Data dissemination to support a research project that never requires re-identification.
Patient account data		Account credentials (including any recovery materials) for citizen accounts for patient-facing online health tools.	A person’s account details for the NHS.UK website.
Patient choices		Statements/preferences made by patients regarding the use of their data.	A person’s expressions of their wishes recorded in their GP’s clinical system or on the Spine.
Patient meta-data (identifiable)		Information about how identified patients have used patient-facing online health tools.	History of an identified person’s use of the NHS.UK website's symptom information.
Patient meta-data (linkable)		Information about how patients have used patient-facing online health tools (not identified, but linkable across sessions).	History of an unknown (but linkable) person’s use of the NHS.UK website's symptom information.
Professional user account data		Account credentials (including any recovery materials) for professional user (such as a clinician) accounts that control access to any personal data (including PCD).	A clinical application logon.
Professional account data (less-sensitive)		Account credentials (including any recovery materials) for professional user (such as a clinician) accounts that control access to anonymised information.	Authentication details to portal providing access to anonymised data.
Audit data		Data that records the use of a system and the provenance of the data that system manages	Clinical system audit trail
	Professional user meta-data	Information about how users have used clinical or administrative tools that process personal data.	History of a GP’s use of their clinical system, or of summary care record (SCR)
	Audit data (personal)	Data describing the use of a clinical or administrative system that processes personal data, where that audit data itself includes or references PCD.	The audit trail of a GP system showing all users’ interactions and use of the system.
	Audit data (non-personal)	Data describing the use of a clinical or administrative system, where that audit data itself does not include or reference PCD.	History of logins to a clinical system.
Key materials		Material that provides long-lived linkage between reversibly-pseudonymised data and personal data, or provides a similarly significant security function.	Look-up tables or decryption keys.
	Very short-lived	One-time decryption keys	A decryption key generated to support (and only usable within) a specific re-identification activity within an individual user session.
	Rotatable	Material that provide linkage between reversibly-pseudonymised data and personal data, that persists over time and over user sessions but is generally rotatable.	An encryption key used by a DSCRO to re-identify pseudonyms included in many data disseminations.
	Long-lived, persistent	Material that provide long-lived and persistent linkage between reversibly-pseudonymised data and personal data, or provides a significant security function.	A root certificate private key for a widespread PKI.

The table below provides an agreed² mapping between the health data types and the Government Security Classification Policy. This enables us to take advantage of cross-government policy statements and published principles (such as the 14 NCSC Cloud security principles) around that classification, whilst treating those statements as necessary but not necessarily sufficient in a health and social care context.

Type	Sub-type	Map to govt. security classification	Notes
Publicly available information		No application mapping	The most obvious mapping is to something like UNCLASSIFIED but this is no longer part of the model.
Synthetic (test) data		OFFICIAL
Aggregate data		OFFICIAL
Already encrypted materials		OFFICIAL
Personal Data (PID)		OFFICIAL-SENSITIVE
	Demographic data	OFFICIAL-SENSITIVE
	High risk demographic data	OFFICIAL-SENSITIVE
	Personal confidential data (PCD)	OFFICIAL-SENSITIVE
	Legally restricted PCD	OFFICIAL-SENSITIVE
	Extra-delicate PCD	OFFICIAL-SENSITIVE
Anonymised data		OFFICIAL-SENSITIVE
Pseudonymised data		Maximum of variants
	Reversibly pseudonymised data	OFFICIAL-SENSITIVE
	Irreversibly pseudonymised data	OFFICIAL-SENSITIVE
Patient account data		OFFICIAL-SENSITIVE
Patient choices		OFFICIAL-SENSITIVE
Patient meta-data (identifiable)		OFFICIAL-SENSITIVE
Patient meta-data (linkable)		OFFICIAL-SENSITIVE
Professional user account data		OFFICIAL-SENSITIVE
Professional user account data (less-sensitive)		OFFICIAL-SENSITIVE
Audit data		Maximum of variants
	Professional user meta-data	OFFICIAL-SENSITIVE
	Audit data (personal)	OFFICIAL-SENSITIVE
	Audit data (non-personal)	OFFICIAL-SENSITIVE
Key materials		Maximum of variants
	Very short-lived	OFFICIAL-SENSITIVE
	Rotatable	OFFICIAL-SENSITIVE	Whilst we need such data to be treated to the highest standards, they do not fit into the government policy criteria for SECRET or TOP-SECRET.
	Long-lived, persistent	OFFICIAL-SENSITIVE	Whilst we need such data to be treated to the highest standards, they do not fit into the government policy criteria for SECRET or TOP-SECRET.

Whilst we can (mostly) demonstrate an appropriate mapping from health data type to the Government Security Classification Policy, there are some limitations that emerge.

Many data types map to OFFICIAL-SENSITIVE, but there are many kinds of data in this category that we will control, and disseminate, in different ways.

We cannot, through the Government Security Classification Policy, indicate the very highly sensitive NHS materials such as PKI secrets as needing any greater control than many other kinds of information.

Data scale

There are two dimensions when considering scale: taking account of the depth (such as the scope of data for any one individual) and the breadth (how many individuals are included).

For depth, data should be treated the same whether there is a single data item that causes a particular classification to apply, or whether there are many.

For breadth, the scale is:

Scale	Description	Example scale
Extra small (XS)	Very low volume	Less than 10,000 records or events
Small (S)	Local scale, such as an individual trust	Between 10,000 and 1m records or events
Medium (M)	Regional scale, such as county or ACO	Between 1m and 5m records or events
Large (L)	National scale	Over 5m records or events

This approach recognises the difference in potential harm given the scale of breach; it provides a wider recognition of very large datasets that are commonly processed across the health system (both inside and outside of NHS England). However, it is recognised that this banding is still somewhat artificial, requiring a degree of judgement.

Data persistence

A public cloud facility can be used to process data in many ways, ranging from, at one extreme, processing that requires long-term persistence of data, to the opposite extreme where data may be purely transient (is never persisted). The range of levels that are used is:

Persistency	Description	Example
Persistent	Data is deliberately placed into persistent physical storage (for example using databases or file stores) for long term/indefinite use.	Clinical System holding long-lived patient clinical information
Temporary	Data is deliberately placed into persistent physical storage (for example using databases or file stores) for a short-defined period, typically for a specific project.	Dissemination environment providing access to national pseudonymised data to support a specific research project.
Cached	Data may be persisted into persistent physical storage as part of the required processing but it is kept only to support time-bound transactions, rather than long-term.	Message queue.
Transient	Data transits the facility but is never intentionally persisted out-of-memory.	Web interface capturing data that is immediately transferred outside of public cloud.

Last edited: 28 May 2025 3:34 pm

Data type

Data scale

Data persistence

Chapters