Skip to main content

Data sharing standard 3 - Data Minimisation

This standard is part of a series of guidance documents to support the various stages of a DARS application.

 

Standard description

The General Data Protection Regulations state that:

Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed (data minimisation)

This means that the amount of data requested must be justified by the purpose stated within the application. In assessing this the following questions will be considered. Additional justification is required where identifiable and/or sensitive data, and/or data for patients aged under 16, is requested.

Datasets

All datasets should be relevant to the purpose.

  • Is it possible to reduce the number of datasets requested? If not, why not?
  • Can the purpose be achieved in a less intrusive way? For example, can the purpose be achieved using anonymised or pseudonymised data?

Years

The number of years must be justified within the application.

  • Is it possible to reduce the number of years requested? If not, why not?

Filtering

Explain why the data for the number of patients is supported by the purpose.

  • Can the data be narrowed by geography? If not, why not?
  • Can the data be narrowed by demographics (such as age)? If not, why not?
  • Can the data be narrowed by clinical factors (such as diagnosis/procedure)? If not, why not?

Episodes

  • Are all the patients' episodes required to achieve the purpose? If so, why?
  • Are all elective episodes required to achieve the purpose? If so, why?
  • Are maternity episodes required to achieve the purpose? If yes, are the unborn child and neonatal records necessary? If so, why?
  • Is there a timeframe around the index event (such as procedure or diagnosis) required? If so, why?

Fields

  • For the records requested, are all fields necessary to achieve the purpose? If not, why not?
  • If identifiable/sensitive fields have been chosen, is it possible to reduce the risk of intrusion? (for example, flag for 30-day mortality rather than full date of death, or survival days which will give an exact age in days but we would supply DOB/DOD, or replacing specific diagnosis codes with categories)

Cohorts/linkages

  • For a data linkage can additional filters be applied? For example, Hospital Episode Statistics (HES) data linked to mental health data but only requiring HES records where there is an associated mental health record.
  • Can a HES cohort be created to minimise the data being provided? For example, if a customer is interested in all episodes for patients with a specific diagnosis/procedure, we can find the HES IDs for these people and provide all episodes for these people (meaning HES is filtered by condition, and Office for National Statistics mortality records are only provided for patients who appear in the HES extract).

Some of the above considerations will require the relevant production team to discuss with the customer. Some can be answered straight away by the customer.


Video

 

View a transcript of the data minimisation guidance video

Slide 1

Hello my name is Tracy and I work as a senior case officer in the data access request service team.

Slide 2

This video is one of a series of presentations designed to help you use our data access request service as effectively as possible. You can view the other videos in this series on our Youtube channel at the following address. NHS digital has published a number of standards in relation to how we assess applications for data from NHS digital. These are designed to be transparent and to help you in completing the relevant section of your online application the data this presentation will provide detail on the agreed standard for completing the following section of the application: Data Minimisation.

Slide 3

When we refer to data minimisation, we're referring to the amount of data you are requesting. You should only request the amount of data that is specifically required so that you can complete your work. This is supported by GDPR Article 5(1)(c) which requires that data shall be:

“adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed”.

In all cases the amount of data requested must be justified by the purpose stated within your application.

Slide 4

So how can you minimise the amount of data requested ?

The datasets requested should be directly relevant to your purpose in your application and some of the things to consider would be:

Firstly can you reduce the number of datasets that you have requested to achieve the purpose?

Can you achieve the purpose in a less intrusive way and what we mean by this is, for example, if you've requested identifiable data, could the purpose be

achieved by using either pseudonymised or anonymous data instead?

Something else to consider could be whether you need the amount of years that you've requested - could you reduce the amount to achieve the purpose? again this will need to be justified within the purpose section.

Slide 5

Other ways to minimise the data could be either by geography or by demography for example by age?

Can you minimize the data by clinical factors? either by diagnosis or procedure for example why does a study on heart attacks need to know about knee cap replacements in that data as well?

Slide 6

Depending on the dataset you may be able to minimise the data by considering the episode or spell:

And things to consider here are:

Are all the patient's episodes required to achieve the purpose and if so why?

Are all elective episodes required to achieve the purpose?

Or are all the maternity episodes required to achieve the purpose? and if they are, are the unborn child and neonatal records necessary and if so, why are they necessary? Is there a time frame around the index events for example procedure or diagnosis and if so why?

Slide 7

Explain how the fields in each record is supported by the purpose for the records requested, are all fields necessary to achieve the purpose? If so, why?

If identifiable or sensitive fields have been chosen, is it possible to reduce the risk of intrusion (e.g. flag for 30-day mortality rather than full date of death, or survival days which will give an exact period in days but without full DOB/DOD, or could you replace specific diagnosis codes with categories.

Additional justification will be required if you are selecting identifiable and/or sensitive data within your application.

Slide 8

For a data linkage, can additional filters be applied? for example HES data that's linked to mental health data but you might only require HES records where there is an associated mental health record.

Another way could be, can a HES record be created to minimise the amount of data provided, for example if you're interested in all episodes for patients with a specific diagnosis or procedure, we can find those people in the data set and then provide all related episodes. An example of this would be where we would filter HES by a condition and ONS mortality records would only be provided for patients who appear in the HES extract.

Slide 9

When submitting your application via DARS online after selecting each product you will also be asked how you are looking to minimise the data. This is a free text box for you to explain how the data is being minimised.

Slide 10

Thank you for listening. We would welcome your feedback on this presentation. If you'd like to provide feedback, then please email [email protected]

Last edited: 28 July 2021 4:47 pm