Skip to main content

Disclosure control methodology for Hospital Episode Statistics (HES) and Emergency Care Data Set (ECDS)

Timing and scope of change

The new methodology will be applied by the Secondary Care Analysis team in producing and disseminating outputs using Hospital Episode Statistics (HES) and Emergency Care Data Set (ECDS) data from September 2018.

In scope

The new methodology with be applied to annual publications from 2017-2018, specifically:

  • Hospital accident and emergency activity
  • Hospital admitted patient care activity
  • Hospital outpatient activity
  • NHS maternity statistics

Monthly publications released on or after 13 September 2018 including:

  • provisional monthly Hospital Episode Statistics for Admitted Patient Care,
  • Outpatient and Accident and Emergency data
  • Provisional Accident and Emergency Quality Indicators for England
  • Provisional data quality report for Emergency Care Data Set (ECDS)
  • tabulations provided through the Data Access Request Service (DARS) from 13 September 2018
  • responses to parliamentary questions, freedom of information requests and media queries from 13 September 2018

Out of immediate scope

Analysis of HES and ECDS data produced:

  • by other NHS Digital teams
  • by external users of this data

We expect these changes will be more widely adopted within and beyond NHS Digital in the longer term. 


Reason for change

When producing analysis, we need to balance accuracy and timeliness of publication with disclosure control to reduce the risk of identifying individuals from the outputs.

The current disclosure control rules for HES and ECDS require the use of secondary suppression after replacing values between 1 and 5 with ’*’, checking that other values within the data cannot be used to recalculate the original small numbers. This is a manual process for each output, which takes considerable time, has a risk of error, and can produce inconsistent outputs.

A disclosure control method that can be automated will be more efficient and provide greater security, particularly when it is used across separate statistical releases. It will also help us to provide timelier data to customers, particularly at sub-national geographies


New methodology

The secondary care analysis team, working with other statistical colleagues within NHS Digital have developed a methodology that:

  • can be fully automated 
  • enables zeroes to be shown
  • is at least as strong as current systems

Counts

National level data

No disclosure control required for small numbers. Restrictions on certain diagnoses and procedures as set out in the HES analysis guide still apply.

Breakdowns below national level

The following steps will be applied to reduce the risk of identifying individuals from small numbers.

  1. If the national total is between 1 and 7 (inclusive), no sub-national breakdown will be displayed.
  2. If the national total is greater than or equal to 8
    1. Sub-national counts between 1 and 7 (inclusive) will be displayed as ’*’.
    2. Zeroes will be unchanged.
    3. All other counts will be rounded to the nearest 5.

Sub national numbers will therefore appear as follows

Before disclosure control 0 1 2 3 4 5 6 7 8 9 10 11 12 13
After disclosure control  0 * * * * * * * 10 10 10 10 10 15

Row or column tables will be calculated and then rounded, which means that the total of rounded values may differ from the rounded total, as in the following example

Before disclosure control

  M F U Total
A 5 10 4 19
B 12 17 11 40
C 8 8 16 32

After disclosure control

  M F U Total
A * 10 * 20
B 10 15 10 40
C 10 10 15 30

Suppression and rounding are applied to figures at NHS Commissioning Region level, which were not subject to disclosure control under the old HES method.


Percentages

The most common calculations using counts of HES records are percentages or rates. Unless the denominator for a percentage is relatively large, it is often possible to work out the numerator and denominator because only one possible pair of values would give this percentage (especially if the percentage is displayed to a higher degree of precision such as 2 decimal places). Disclosure control will therefore also need to be applied to percentages.

National level data

No disclosure control required for small numerator or denominator. (Restrictions on certain diagnoses and procedures as set out in the HES Analysis Guide still apply).

Breakdowns below national level

The following should be applied when calculating percentages at sub-national level:

  1. Where the numerator or denominator is between 1 and 7 (inclusive), no percentage or rate is calculated, and a ’*’ will be displayed.
  2. Where the numerator is zero, the percentage will be 0%.
  3. Where the unrounded numerator and denominator are greater than or equal to 8, a percentage or rate is calculated using the rounded numerator or denominator.

The following example shows the application of disclosure control to the counts and the resulting percentages

Unrounded counts

  M F Total
A 0 16 16
B 5 2 7
C 5 12 17
D 9 12 21
E 8 14 22

Rounded counts

  M F Total
A 0 15 15
B * * *
C * 10 15
D 10 10 20
E 10 15 20

Percentages

  M F
A 0% 100%
V * *
C * 67%
D 50% 50%
E 50% 75%

Note that because row/column totals will be calculated and then rounded, the total of rounded values may differ from the rounded total, and row/column percentages may sum to more than 100%, as in row E above.

Users should be aware that percentage calculated using the rounded numerator and denominator will differ from the percentage that could be calculated using unrounded numbers, by quite a large degree for smaller denominators.

The presentation of percentages section sets out the approach the secondary care analysis team will take in our outputs to allow users to assume a given degree of accuracy in the values displayed. (This presentational approach to percentages does not form part of the disclosure control method, and users making their own calculations are free to take a different presentational approach if they wish.)

Simple calculations

For calculations such as the mean, median or mode where the number of values used in the calculation represents a number of individuals:

  1. At national level, no disclosure control is required.
  2. At sub-national level, where the number of values used is between 1 and 7 inclusive, no calculation is made and a ‘*’ is displayed.
  3. At sub-national level, where the number of values used is greater than or equal to 8, the calculated value is displayed.

Complex calculations

For complex calculations such as standardised rates, confidence intervals and regression models, no disclosure control is required.

Values not relating to individuals

Disclosure control only needs to be applied to values relating to individuals. No rounding or suppression is required for values not relating to individuals such as a count of providers


Presentation of percentages

Calculating a percentage using a rounded numerator and denominator will result in a different value from the ’true’ percentage that would have been calculated using unrounded values. This difference can be a number of percentage points if the denominator is relatively small, as in the following examples.

Unrounded numerator Unrounded denominator 'True' percentage Rounded numerator Rounded denominator Displayed percentage Difference
8 27 30% 10 25 40% +10%
12 23 52% 10 25 40% -12%

Where the rounded denominator is greater than or equal to 400, the percentage using a rounded numerator and denominator is within +/- 1 per cent of the ’true’ percentage.

HES and ECDS outputs produced by the secondary care analysis team will therefore only display a calculated percentage to the nearest whole number where the rounded denominator is greater than or equal to 400.

Users will be able to assume that this is within one percentage point of the ’true’ percentage. For example

Rounded numerator Rounded denominator Displayed percentage Range of unrounded numerator Range of unrounded denominator Range of 'true' percentage
20 400 5% 18 -22 398- 402 4% - 6%

To display percentages rounded to 1 decimal place and for users to be able to assume that this is within +/- 0.1 per cent of the ’true’ percentage, the rounded denominator would need to be greater than or equal to 400.

This presentational approach to percentages does not form part of the disclosure control method, and users making their own calculations are free to take a different presentational approach if they wish.

If users wish to calculate percentages where the rounded denominator is below 400 and display these values, they can assess the maximum difference between the ’disclosure controlled’ percentage and the ’true’ percentage as the maximum of:

  • ’disclosure controlled’ percentage – ((rounded numerator-2) divided by (rounded denominator+2))
  • ((rounded numerator +2) divided by (rounded denominator-2)) – ’disclosure controlled’ percentage

Similarly, if users calculate a percentage with a much larger rounded denominator, they may be able to assess the ’disclosure controlled’ percentage to have a high degree of accuracy if the maximum difference is very small.


Other users of HES data

Timing of methodology change

We recognise that there are numerous internal and external users of HES data who produce analysis using the existing disclosure control methodology.

It is not expected that all users immediately adopt the new disclosure methodology that the secondary care analysis team will start using from September 2018.

However, it is intended that users will seek to switch over to the new methodology over a period when it makes business sense to do so, for example when an external user renews a data sharing agreement.

HES analysis guide

The secondary care analysis team will produce a revised version of the HES analysis guide to include the new methodology, which will refer to the transition period for other users as described above. 


Similar outputs in transition period

It is possible that until the new methodology has been adopted by all HES and ECDS data users, tabulations may be produced and released reporting the same activity but using different disclosure control methodologies.

The secondary care analysis team have investigated the likelihood of identification of individuals from two identical tables, one produced using each methodology, and assessed this likelihood to be small. The small risk has been judged to be within acceptable boundaries by the chief statistician within NHS Digital.


Contact us

We welcome any questions, comments or feedback relating to this disclosure control method. You can contact us by [email protected].  

 

Last edited: 16 November 2023 11:01 am