Skip to main content

Access HES with DAE

Guidance on accessing Hospital Episode Statistics (HES) data using the Data Access Environment (DAE).

Introduction

Data Access Environment (DAE) offers a secure way for users to remotely access better linked information and ensures the right person, with the right permissions, gets the right data in accordance with their Data Sharing Agreement (DSA).

Providing access to data through our secure environment reduces the need for it to leave NHS Digital. Users can take their insights to the next level by using a range of built-in tools to interrogate, analyse and visualise data for a more complete picture of health and care.

DAE forms the access point of our improved data processing services, a central system that will give us the power to process larger volumes of data – faster than we've ever done before.

DAE is an online analytical platform with tools to access data. This secure and remotely hosted access environment enables organisations to access Hospital Episode Statistics (HES) for a wide range of data analytical purposes.

DAE allows you to access NHS Digital data from any location with an internet connection. You should ensure that the environment you access DAE from complies with the agreed locations as described in your DSA.

 

DAE should not be accessed in any public area or location where data could be overseen by any party not explicitly authorised in your Data Sharing Agreement.

The data sets currently available in DAE are:

  • Admitted Patient Care (APC)
  • Outpatients (OP)
  • Accident and Emergency (AE)
  • Maternity (APC_MAT)
  • Critical Care (CC)

There are also several reference data tables, including diagnosis and procedure code descriptions, with more national datasets being added.

The system is hosted and audited by NHS Digital, meaning that large transfers of data to onsite servers are reduced. NHS Digital also has the ability to audit the use of and access to data.

Hadoop User Experience (HUE) is the primary interrogation tool currently available within DAE. Over time, it's expected that further tools will be added to assist users in making the deeper analysis and visualisation crucial to local public health, commissioning or research requirements.

We will use feedback from users to help inform what further tools are made available in future, based on functionality and capability requirements.

HUE user guide

DAE is accessed via a web browser. HUE, the primary analysis tool, allows users to query and retrieve data. HUE features a central query window that allows users to write Structured Query Language (SQL), to interrogate NHS Digital provided data. It also contains supporting windows designed to assist the process of retrieving data.

The aim is to promote self-service for users and offer a simple interface, so users can find, explore and interrogate data easily.

HUE uses the Apache Hive version of SQL. Information on the differences between SAS SQL and Hive SQL is described in an appendix to this document.

The Apache Hive language manual can also be found online on the Apache Hive Confluence page.

Accessing DAE

To access DAE, simply open a web browser and visit the DAE login page:

DAE login page showing username and password fields and a next button

You will need your provided username and password to log in. If this is your first logging into DAE, you will be asked to change your password to one you can safely remember.

You will then be asked to provide your Two-Factor Authentication (2FA) code.

Two-factor authentication code page with a field asking for a 6 digit code generated by your two-factor authentication application, along with a log in button

Visit NHS Digital’s DAE web page to download instructions on how to set up your authentication code.

Access HUE

Once you have logged in you will see a list of all your data agreements. From this list you should select the agreement for the data that you want to query (most users will see only one, but some may see more).

After selecting your agreement, you will see a range of applications, depending on your access rights. 

Click on the HUE icon to continue.

You will be asked to provide your username and password again to access HUE.

A web page with the HUE icon circled to help users know which button to click on

You can go back to the applications tab and open additional HUE tabs, but avoid opening more than three HUE tabs, as this can cause problems running queries.

Only one query can be run at any one time, so second or third queries in different tabs will be queued. Attempting to open more than three tabs may cause queries in other tabs to fail.

Logging out

If you select the wrong agreement, you can return to the selection screen by pressing ctrl-alt-shift together and selecting ‘logout’ from the drop down.

Image of logging out of system

Writing a query

The list of available databases, such as HES, can be found on the left-hand side of the screen. Click on the database to see the available tables. These data sets are the ones you can access in line with your DSA agreements.

List of available databases

You should write the query in the central query window, with the basic syntax as follows:

COPY
SELECT variable names (for example, a column name or a

function count)

FROM database.table_name (for example, hes.hes_apc_1718)

WHERE filter conditions (for example, startage_calc < 17)

GROUP BY fields used to group aggregates

plaintext

Example query

To find the number of Finished Consultant Episodes (FCEs) by gender in 2016/17 for a trust (such as RC1), you would type the following query:

COPY
SELECT fyear, sex, sum(fce) as fce

FROM hes.hes_apc_1617

WHERE procodet=’RC1’

GROUP BY fyear, sex;

plaintext

If you type SELECT then FROM, such as SELECT FROM hes.hes_apc_1617 then go back to SELECT, as you start typing your column names HUE will offer you the available options. It can only do this if you have told it in the FROM section which table you are wanting to use.

 

You can also drag and drop table names and column names into your query from the left-hand databases window. Once you have selected a table it will appear in the right-hand assistance window and you can drag column names from there into your query.

To run your query, click the triangle on the left of the query window. Your results will appear underneath the query window. To open a new query window, click on the three dots in the top right corner of the query window and select ‘New’.

If you have more than one query in the query window, you will need to highlight what you want to run or your query should end with a semi colon (;) which tells HUE that the SELECT statement has come to an end. To link to reference data, these tables can be found in the dss_corporate database.

An example of a query could be:

COPY
SELECT a.procodet,b.org_description,sum(a.fce)

FROM hes.hes_apc_1617 a

LEFT OUTER JOIN dss_corporate.hesf_org_1617 b

ON a.procodet=b.org_code;

plaintext

The LEFT section brings everything from table A and joins it to table B. The ON section joins procodet (table a) to org_code (table b). The OUTER tells the query to include all rows from the HES table even if they do not have a match in the org table.

For security purposes, your DAE session will timeout after 15 minutes of inactivity.

 

During query execution the background HUE session may remain open. You may be able to reconnect and see HUE as it was but if your session disconnects due to timeout or network issues you may lose work.

Saving a query

To save your query, click the ‘Save’ icon from the cluster of actions available at the top right of the Query Editor.

The Save Query dialogue window showing the name and description fieldsYou will be presented with the Save Query dialog window.

The Save Query dialogue window showing the empty name and description fields waiting to be populated

Enter a name and a description to save along with your query, then click the ‘Save’ button.

The Save Query dialogue window with the name and description fields populated with example content

To quickly open your saved queries, click the ‘Saved Queries’ tab at the bottom of the query window. You can also access saved queries by clicking on the Documents icon on the left hand of the screen.

Click on the saved query you want to open. Folder management and more flexible document management is available using the Document Browser.

You can paste into DAE, but security restrictions mean you cannot copy and paste out of your browser. This means that you will not be able to simply copy SQL code and paste it elsewhere. For some users they may wish to write their SQL locally and paste it in.

Sharing a query

To share your query, click the three dots in the top right of the query screen on a saved query or use the Share option in the document browser.

The sharing settings dialogue window

Select share.

On the dialogue box, start typing the name of the person you would like to share the query with, and the list will filter by person. When you see the name of that person, click to select and then select ‘Read’ or ‘Read and Modify’ from the drop-down list on the left. Clicking on the selection will move the person’s name to the appropriate list.

The intended recipient will then be able to view the query in their list of queries. They will only be able to run the query and see the results if they have the appropriate rights to access that data.

The document browser

When a query is saved the SQL is shown in a simple list under saved queries. This gives quick access to recently saved queries. For more document management options, use the document browser by selecting the menu icon (three horizontal bars) in the top left of the screen and selecting the Documents > Browser option.

The HUE menu icon circled to indicate what to click on to select the documents browserThe documents browser allows you to create folders and drag saved queries to those folders.

These folders are not visible in the Saved Queries quick access screen in the editor window. They are visible in the left navigation window if you select the folder icon.

HUE documents browser

The five icons on the top right of the screen, in order are Search, New Document, Share, More (copy, move to trash, export) and a Trash icon.

Search, New Document, Share, More (copy, move to trash, export) and a Trash icons

To create a new folder/directory click on the new document icon and select new folder. You can then drag and drop previously saved queries and notebooks into these folders.

Folders can be shared in the same way as files.  In the browser window, right click on the folder and select share.   You can share a folder with several users and individually grant permissions to read or modify. This enables file sharing with a group.

By sharing a folder, the users will be able to see all files currently in the folder and any that are added afterwards. The sharing of files for specific users will need to be done separately for the individual files outside a folder.

Using your own schema

All DAE users have their own database and access to the core databases subject to their data sharing agreement 

Your database name will be based on your username or a short code. For our example, we shall use JOBL101.

Image of the DAE user database

In your database, you can create and delete tables, temporary tables and views. To do so you need to specify the database name in front of your table.

For example, you can create a view that includes the pseudonymised provider spell number (PROVSPNO OR PROVSPNOPS depending on your access) and SUSSPELLID from hes_apc_otr tables.

CREATE VIEW JOBL101.HES_APC_ALL_1819
AS
SELECT T1.*,
               T2.PROVSPNO,
               T2.SUSSPELLID
FROM HES.HES_APC_1819 T1
    INNER JOIN HES.HES_APC_OTR_1819 T2
        ON T1.EPIKEY=T2.EPIKEY;

You can then reference JOBL101.HES_APC_ALL_1819 in other queries which may make it easier to write the SQL.

The advantage of views is that they will change with the data in the table, so as the HES data is refreshed a view against the current years HES will also update. 

Tables are static copies of the data and will not be updated. Creating a table that is a subset of the data from the parent tables may accelerate the running of a complex query against it.  Creating large complex tables does not achieve the same performance benefits because SQL does not automatically create tables in the most efficient manner.

Temporary tables will last for the duration of the user session. User sessions may last a few hours or even a couple of days, but they will eventually be dropped during a weekly release, if not before.  The advantage of temporary tables is that they do not need to be managed so if you create a lot of working tables for a query, you do not have to go back and delete them when you are finished.

Exporting results and queries

To export a result set or a query you download a file which DAE places in an “S3 bucket” on Amazon Web Services (AWS) in an account created with your DAE login credentials. You can then use the instructions in the Saving files locally from your AWS S3 Account section below to download a file to your local computer.

Downloads are only available if specified in your Data Sharing Agreement.

If your Data Sharing Agreement requires approval prior to download, you should continue to follow the steps below and once your query has been submitted it will be sent to NHS Digital for approval. NHS Digital will then notify you of the outcome and if approved, the file will appear in your AWS folder.

Download results method 1

The simplest and tidiest way to export the results from a query is to select the Export results button from the left of the results page. This will allow you to select CSV or Excel and create a file that looks similar to the results table on screen.

This will work for all queries with results less than 100,000 rows.

The results page with the 'Export results' button circled

The file will then be available in your AWS S3 account and can be downloaded using the instructions in the section Saving files locally from your AWS S3 Account below.

Download results method 2

Where your query results in more than 100,000 rows, first, ensure that your query executes correctly and produces the results you want.

The results page with 'Select' circled

Select your query and copy it. Now click the User Exports service from the DAE Homepage.

Screenshot of the Tools web page with buttons linking to Hue, JupyterHub and User Exports

From the User Exports service menu, select the “Run Hive Query, Export Results to S3” option.

The Run Hive Query, Export Results to S3 option

  1. Paste your query into the ‘Enter Query’ field.
  2. Enter a name for your saved file.
  3. Select the exported file type, either CSV or Excel.
  4. Then click Submit.

Some queries may return “Failed to run Query” response. If you receive this, please check:

  • you have not included a semicolon (;) at the end of the query as this is not required to download the file
  • you have included the database name
    - you can run queries from within the HES database without specifying the table, e.g. writing HES_APC_1819 instead of HES.HES_APC_1819. With this download method you must always specify the database
  • your output file is not already in your AWS S3 folder as sometimes the error is reported even though the query has worked.

For some complex queries it is difficult to successfully run them using this method.  The workaround is to save the results as a table in your database by adding CREATE TABLE <your database>.<tablename> AS to the top of the query then running SELECT * FROM  <your database>.<tablename> in data out.

Saving query results screen

Your query will start executing and saving the result set.

Once successfully completed, the status message will update.

Your file will now be verified and moved to an AWS S3 Account ready for downloading to your local computer. The instructions are below in Saving files locally from your AWS S3 account.

CSV files that need to be checked first do not currently include the header information, so please use the Excel file type for these outputs.

Download a query

To export the actual query rather than the results from the DAE Environment, you first must save your query as described above in Saving a query.

Now your query has been saved, you can view it in the Documents browser in HUE (see above).

Saved query in HUE documents browser

The documents browser should list your saved query.

Listed saved query in HUERight-click on the saved query to open the document context menu or select the More (three dots icon).

Opening the context menu by right clicking the saved queryClick the Download context menu item to download the saved query to your session.

Downloading the saved query to your session

Now that your saved query has been downloaded, it will automatically be copied over to your AWS S3 Account, ready to be downloaded to your local computer.

Saving files locally from your AWS S3 account

Go to the DAE AWS Console.

You will be presented with the DAE Login screen.

Enter your DAE credentials as requested and enter your Two-Factor Authentication code when requested, as described in the earlier Accessing DAE section.

The AWS Welcome Screen will be displayed. There is a lot of information here, but the only thing you need is to enter “S3” in the Find Services search field and press enter. This will open the S3 file browser.

Finding services in AWS

From the S3 file browser, click the NHSD-DSPP-DAE-Data-Out item.

S3 file browser screenshot with 'NHSD-DSPP-DAE-Data_Out' item circled

You will be presented with a list of folders.

Locate your folder and click it to open.

Opening a folder in AWS

The list of files in your folder and your new results file will be displayed.

List of files in a folder displayedClick your results file to bring up the File Properties dialog on the right.

File properties dialogue box

From the 'File Properties' dialogue on the right, click the 'Download' button to download your results file to your local computer. On Windows PCs the file is stored in your default downloads folder (usually Downloads) and, depending on your local security, some browsers will allow you to open the file from a dialogue box at the bottom of the web page.

Export data outputs

Depending on the terms agreed in the Data Sharing Agreement, some users will be permitted to export outputs without any further checks, whilst some users need to have their outputs reviewed to confirm small number suppression and aggregation have been applied.

Only one query can be executed at a time. You should only output one query at a time, in accordance with your Data Sharing Agreement.

 

Use of escape characters have been made in Excel outputs to ensure the field type integrity. These escape characters only appear when editing a cell and make no difference to the data, other than informing Excel how to display the content.

Export data – permission required

Users that require outputs to be reviewed before export will follow the steps outlined above.

Once the internal approval process is complete, the file will appear in your AWS folder and NHS Digital will contact you to inform you that the file is now there.

Examples

Patients with suspected eating disorders

How many children/adults were admitted to hospital with a suspected eating disorder in 2013-14 and 2014-15?

It's not possible to identify suspected eating disorders. The only option will be to identify the number of Finished Admission Episodes (FAE) with the ICD-10 code F50, which is a primary diagnosis of an eating disorder.

This query can be built looking separately at the hes_apc_1314 and hes_apc_1415 tables and joining the results. But (and providing you have appropriate user rights) it is easier to build using hes_apc_10y (10 year view). These are in the HES database.

The query also involves querying the icd10_codes lookup table to get the description. This reference table is in a different database to the data, the dss_corporate database. The resulting query when written will look similar to the one shown below.

Format and layout

Case and indentation are not important, but being consistent can make the query easier to read. This is except for cases where a user is filtering against data in the database. For example, when looking at diagnosis in the data ‘f50’ is not the same as ‘F50’ due to the difference in letter case and will probably not return any rows.

The colours are assigned by the HUE interface and help with readability.

DAE HUE interface colour coded examples

Most queries begin with a SELECT command. You may also come across CREATE and DROP at some point but these operations can only be run in your own database.

The FROM command is where we tell the query which database and table to use. In this case the first table is hes.hes_apc and immediately after this there is t1. This is known as a table alias, it is used to help make the query easier to read. Instead of writing hes.hes_apc.fyear each time you wish to refer to the fyear column, you can just write t1.fyear and the system knows that t1 is the same as hes.hes_apc_10y.

You can use any alias that you wish, but it is advisable to use short names so that they are easier to read for big queries. For example, t_data or t_d_ref (for diagnosis reference).

Under the FROM command there is a LEFT OUTER JOIN command that tells the query to join to the next table, which is the icd10_codes lookup. LEFT means that all the values on the left-hand side (the first table) will be returned even if there isn’t a match in the right. This is good practice particularly with reference tables where a null or bad value in your data might be otherwise excluded, giving you incorrect results. If you want to use the reference table as a filter then just specify the JOIN command.

The join in ON t1.diag_3_01=t2.code is where the value in the t1 hes.hes_apc_10y threecharacter diagnosis field is equal to the code in the t2 diagnosis lookup table. There is also AND t2.version=’5th ed’. Some of the reference tables contain more than one code version. Currently the icd10 codes table contains versions ‘2000’,’4 th ed’ and ‘5 th ed’. The version most appropriate to the year of the data should be used. Note that in order to specify it as text, the value is enclosed in a single quote mark using the apostrophe character on the keyboard.

Back to the SELECT command, this is a selection of columns or treatments of columns wanted in the results, each one except the last separated by a comma. The first two columns selected are fyear (financial year in two digits e.g. 1617 for April 2016-March 2017) and diag_3_01 which is the first 3 digits of the first coded diagnosis, the reason for admission. The full description is then selected from the reference table.

The CASE command allows the startage_calc (the age at start of the episode as derived by the HES processing) to be converted into those ‘Under 18’ and ’18 and over’.

Think of the case WHEN t1.startage_calc<18 THEN ‘Under 18' as you asking: “when it is the case that startage_calc is less than 18, please report the value Under 18 in the results.”

The calculation {C} has been used to save typing and make things readable. The same results could have been obtained by typing a long list like below, but this can be tedious. With some data the tedious typing is unavoidable.

Example of long HUE query

At the end of the CASE there is always an END command, this is required and will result in an error if not present. In this query there is also an optional AS age_group, which specifies  age_group as a column alias for the whole CASE statement in the same was that t1 is a table alias.

Finally, in the SELECT command there is the aggregate function SUM(t1.fae). FAE takes the value 1 if the finished consultant episode is the admitting episode and 0 for other so this will only add up admissions in the year.

The WHERE command is used to filter the data. If this is not present all the data will be returned. In this query fyear has been limited to 1617 and 1718 financial years using IN and the list off values in brackets. Note that these values are in single quotes as they are behaving as text here (2001/02 is 0102 not 102). The first diagnosis is equal to F50 and the only records looked at are finished admission episodes where fae=1.

A GROUP BY command is only required when using an aggregate function in your select statement such as SUM(), COUNT() or AVG(). This tells the query which columns the sum should apply to and is usually a list of all the non-aggregate columns in the data. Note that the CASE statement is the same as the one in the SELECT command (copy and paste) but does not have the AS column alias at the end.

You can order by age_group (a column name) but you can’t group by age_group, because the variables in the select statement haven’t been created yet to use the new variables. It needs the whole GROUP BY statement in order to perform the sum function and finish creating the columns with their new names. 

Differences between SAS SQL and HUE SQL

Table and column naming

Users can create their own tables, temporary tables and views. There are restrictions on the names used, which can be found here, so in some cases existing code may not work:

https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_identifiers.html

When you create a table or view you must specify your own “database” or “schema”

You may find that SQL created elsewhere has added squared brackets to table and schema names. In DAE, the squared brackets are not required and should be removed when converting code.

The schema names will also be different.

So, for example, [HDIS9000].[HES_APC_1415] becomes HES.HES_APC_1415

Searching for a diagnosis/procedure or list of diagnoses/procedures.

When searching for a list of diagnoses/procedures in SAS the code uses CONTAINS to search a field that contains all diagnoses in a single concentrate string. There are fields for concatenations at the 3- and 4-digit level that do the work of a wildcard search. Thus, W96 in the opern_3_concat does the work of a wildcard search of W96* on all procedures returning the record if any one contains the code.

An example code block might be:

COPY
WHERE ( opertn_4_concat CONTAINS 'W961'

 OR opertn_4_concat CONTAINS 'W962'

 OR opertn_4_concat CONTAINS 'W963'

 OR opertn_4_concat CONTAINS 'W964'

 OR opertn_4_concat CONTAINS 'W965'

 OR opertn_4_concat CONTAINS 'W966'

 OR opertn_4_concat CONTAINS 'W968'

 OR opertn_4_concat CONTAINS 'W969' )

SQL

The equivalent Hive SQL would use the INSTR (instring - is value in string) function. This returns a number representing the position of the start of the searched value in the string or 0 if no found. Thus, any value >0 means the value is present.

The equivalent Hive SQL block might be:

COPY
WHERE ( INSTR(opertn_4_concat,'W961')>0

 OR INSTR(opertn_4_concat,'W962')>0

 OR INSTR(opertn_4_concat,'W963')>0

 OR INSTR(opertn_4_concat,'W964')>0

 OR INSTR(opertn_4_concat,'W965')>0

 OR INSTR(opertn_4_concat,'W966')>0

 OR INSTR(opertn_4_concat,'W968')>0

 OR INSTR(opertn_4_concat,'W969')>0 )

SQL

It is also worth noting that Hive SQL supports regular expressions using the RLIKE comparison, ie <column_name> RLIKE <regular expression> will return the same results.

COPY
WHERE opertn_4_concat RLIKE 'W961|W962|W963|W964|W965|W966|W968|W969'

SQL

Union

The union function allows the results of two different queries to be concatenated together if
they have the same fields.

Hive does not support the CORRESPONDING (CORR) keyword used in UNION joins in SAS. CORR allows variable column order in selections and matches by column name. In Hive all columns must be specified in order and have the same name. This may require the use of a column alias.

It should also be noted that if you specify UNION, duplicate rows across tables will be suppressed, this is the same command as UNION DISTINCT which can also be used. You may need to specify UNION ALL which will retain all rows from both queries.

Date handling

There are a number of ways to get parts from a date field, using the APC episode end date where epiend = 23 April 2016.

For example, getting the year from a date:

year(epiend) will return the year as a number 2016
trunc(epiend,'YY') will return the first of January for the year of the date as a text field, the 2016-01-01 date_format(epiend,'yyyy') will return the year as text '2016'

Please note that yyyy returns the calendar year while YYYY returns the week year. You can find other date_format masks.

How do I know how many rows there are in my results?

There is no automatic or selectable function in Hue to do this. You can work around this by putting your query in brackets and counting the result, for example:

COPY
SELECT * FROM hes.hes_apc WHERE FYEAR='1819'

SQL

becomes

COPY
SELECT COUNT(1) FROM (SELECT * FROM hes.hes_apc WHERE FYEAR='1819') t1

SQL

Note that an alias (the t1) is required for this SQL to work and not having it usually results in an unhelpful error message.

Including prompts and variables

This is for advanced users only.

Hive SQL supports variables and prompts in the text.

For example, placing ${name} in a query will offer a prompt at the bottom of the screen.

Where prompting for a text field the prompt should be in quotes within the query but the input should not.

This could be used for example to request runtime dates or be used to make queries that result in the creation of local tables more portable, as in this example:

Example of the query

Preceding the query with a set statement can also be used but the variable has to be preceded with hivevar. 

You will need to select and highlight all the statements before clicking the blue triangle in order to get them to run in sequence.

COPY
set hivevar: PeriodStart = 2019-02-01;

set hivevar: PeriodEnd = 2019-02-28;

SELECT count(*)

FROM hes.hes_apc_1819

WHERE epend between ‘${hivevar:PeriodStart}’ and ‘${hivevar:PeriodEnd}’;

plaintext

LEFT is not currently supported.

Where LEFT() is currently used SUBSTR() will need to be used instead.

For example, instead of e.g. LEFT(admimeth,1) = ‘1’ to identify elective cases you will need to use SUBSTR(admimeth,1,1) = ‘1’.  

SUBSTR() works in a similar manner to LEFT() but requires two numeric values as well as the string, the first numeric specifies the start position for cutting a string and the second specifies the number of letters to take.  Left always starts at the first letter so just requires the number of letters.

Admissions with fractures

How many fracture admissions by age band and area fractured were there from April 2011 to March 2019

This example shows how to create new columns in your results based on data in the table.

COPY
SELECT  FYEAR,

        CASE WHEN STARTAGE_CALC BETWEEN 0 AND 9 THEN '0_9'

	       WHEN STARTAGE_CALC BETWEEN 10 AND 19 THEN '10_19'

             WHEN STARTAGE_CALC BETWEEN 20 AND 29 THEN '20_29'

	       WHEN STARTAGE_CALC BETWEEN 30 AND 39 THEN '30_39'

	       WHEN STARTAGE_CALC BETWEEN 40 AND 49 THEN '40_49'

	       WHEN STARTAGE_CALC BETWEEN 50 AND 59 THEN '50_59'

	       WHEN STARTAGE_CALC BETWEEN 60 AND 69 THEN '60_69'

	       WHEN STARTAGE_CALC BETWEEN 70 AND 79 THEN '70_79'

	       WHEN STARTAGE_CALC BETWEEN 80 AND 149 THEN '80+'

		 ELSE 'Unknown' END AS AGE_BAND,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S02')>0 THEN FAE ELSE 0 END) AS SKULL,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S12')>0 THEN FAE ELSE 0 END) AS NECK,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S22')>0 THEN FAE ELSE 0 END) AS RIB,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S32')>0 THEN FAE ELSE 0 END) AS SPINE_PELVIS,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S42')>0 THEN FAE ELSE 0 END) AS SHOULDER,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S52')>0 THEN FAE ELSE 0 END) AS FOREARM,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S62')>0 THEN FAE ELSE 0 END) AS WRIST,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S72')>0 THEN FAE ELSE 0 END) AS FEMUR,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S82')>0 THEN FAE ELSE 0 END) AS LOWER_LEG,

		SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S92')>0 THEN FAE ELSE 0 END) AS FOOT

FROM HES.HES_APC_10Y

WHERE FYEAR IN ('1112','1213','1314','1415','1516','1617','1718','1819')

GROUP BY CASE WHEN STARTAGE_CALC BETWEEN 0 AND 9 THEN '0_9'

	        WHEN STARTAGE_CALC BETWEEN 10 AND 19 THEN '10_19'

              WHEN STARTAGE_CALC BETWEEN 20 AND 29 THEN '20_29'

	        WHEN STARTAGE_CALC BETWEEN 30 AND 39 THEN '30_39'

	        WHEN STARTAGE_CALC BETWEEN 40 AND 49 THEN '40_49'

	        WHEN STARTAGE_CALC BETWEEN 50 AND 59 THEN '50_59'

	        WHEN STARTAGE_CALC BETWEEN 60 AND 69 THEN '60_69'

	        WHEN STARTAGE_CALC BETWEEN 70 AND 79 THEN '70_79'

	        WHEN STARTAGE_CALC BETWEEN 80 AND 149 THEN '80+'

		  ELSE 'Unknown' END,

		  FYEAR

ORDER BY FYEAR, AGE_BAND

SQL

This query counts admissions that have a fracture diagnosis on the admission episode by year and fractured body area.

The query starts with select then the column FYEAR which is financial year.

The query then creates a new column age band using a CASE statement.

WHEN STARTAGE_CALC BETWEEN 0 AND 9 THEN '0_9’ 

This says, when the HES calculated age is between 0 and 9 return the value for age band ‘0-9’ and then so on for each age band, and a value for everything without a valid age.

The END command tells the SQL that the case statement is over and the AS command is used to give the query a name for this new column END AS AGE_BAND, in this case AGE_BAND.

Next the system adds a column which is a count of first admission episodes for each body area

SUM(CASE WHEN INSTR(DIAG_3_CONCAT,'S02')>0 THEN FAE ELSE 0 END) AS SKULL,

The line above states, when the value S02 can be found in the field diag_3_concat then count the FAE field (which is 1 for FAE and 0 for other records or if S02 isn’t in the diagnosis for that record count 0. This will result in a sum of all the FAE (first admission episodes, i.e. the episode that started the patient’s stay) records that have the ICD10 code S02 coded in any position.

Note that in this query, if the patient has multiple fractures, e.g. a skull fracture and a  rib fracture they will be counted once for each body area.

This query is run against the HES_APC_10Y which is a ten year view of the data available to some DAE users.

This allows 

WHERE FYEAR IN ('1112','1213','1314','1415','1516','1617','1718','1819')

to be used to filter the queries to the years wanted and means that multiple years do not need to be added together.

The columns with the SUM( are aggregate columns, when an aggregate function is used then the SQL must include a GROUP BY command and a list of the columns that are being aggregated against.

Note that the alias AS AGE_BAND is not used in the group by statement.

There is also an order by that will sort the results.  As order by happens at the end it is possible to refer to the AGE_BAND column by the alias rather than repeating the case statements.

Repeat Attendances over a year

How many patients between 5 and 25 had 10 or more outpatient attendances in any single 30 day period in 18/19.

This advanced example shows how to join a table to itself and how to use “common table expressions (CTE)” to managed subqueries that might be referenced more than once.

Please note that this is a more advanced example and it is not expected that inexperienced SQL users will be able to work through it.

COPY
WITH SETX AS  

(SELECT T1.PSEUDO_HESID,

       T1.PROCODE5,

	   T1.ATTENDKEY,

	   T2.ATTENDKEY AS ATTENDKEY2,

	   T1.APPTDATE,

	   T2.APPTDATE AS APPTDATE2,

	   T1.APPTAGE_CALC,

	   DATEDIFF(T2.APPTDATE,T1.APPTDATE) AS ATTGAP

FROM HES.HES_OP_1819 T1

  INNER JOIN HES.HES_OP_1819 T2

  ON (T1.PSEUDO_HESID=T2.PSEUDO_HESID

  AND T1.PROCODE5=T2.PROCODE5)

WHERE DATEDIFF(T2.APPTDATE,T1.APPTDATE) BETWEEN 0 AND 30

  AND T1.APPTAGE_CALC BETWEEN 5 AND 25

  AND T2.APPTAGE_CALC BETWEEN 5 AND 26),

  

 SETY

AS (SELECT PSEUDO_HESID,

       PROCODE5,

	   ATTENDKEY,

	   APPTDATE,

	   COUNT(1)

FROM SETX

GROUP BY PSEUDO_HESID,

         PROCODE5,

	   ATTENDKEY,

	   APPTDATE

HAVING COUNT(1)>9)



SELECT SETX.PROCODE5, 

       CASE WHEN APPTAGE_CALC BETWEEN 5 AND 15 THEN '5-15'

	       WHEN APPTAGE_CALC BETWEEN 16 AND 18 THEN '16-18'

     	       WHEN APPTAGE_CALC BETWEEN 19 AND 25 THEN '19-25'

       END AS AGE_BAND,

       COUNT(DISTINCT SETX.ATTENDKEY2) AS ATTENDANCES, 

	  COUNT(DISTINCT SETX.PSEUDO_HESID) AS PATIENTS 

FROM SETX

WHERE SETX.ATTENDKEY IN (SELECT DISTINCT SETY.ATTENDKEY FROM SETY)

GROUP BY SETX.PROCODE5,

       CASE WHEN APPTAGE_CALC BETWEEN 5 AND 15 THEN '5-15'

	       WHEN APPTAGE_CALC BETWEEN 16 AND 18 THEN '16-18'

     	       WHEN APPTAGE_CALC BETWEEN 19 AND 25 THEN '19-25'

       END

SQL

The WITH prefix tells the SQL that you are going to use a common table expression (CTE), find out more information.

There is then an alias name, an AS and then a query in brackets.  If you use a second common table expression you need to place a comma after the first query.

In this case the first query creates a CTE (effectively an inline view or virtual table) called SETX of all the records in the table where the same patient has a subsequent appointment within 30 days at the same provider.  

The table HES_OP_1819 is given the alias T1 then joined to itself using a different alias T2.  The able is joined on PSEUDO_HESID and PROCODE5 so will join all the attendances for the same person is in both tables.  For this analysis, a record is joined to itself, this could be avoided if wanted by adding the filter where T1.ATTENDKEY<>T2.ATTENDKEY.
The query creates a new column using DATEDIFF which returns the number of days between the second attendance and the first.  It is also used as a filter and only returns records where the second attendance is 0 to 30 days after the first.

The query also checks age, using 25 as the maximum age in the first table but 26 in the second table in case the patient has had a birthday in the meantime.

Using this first query, the second CTE, SETY returns all the records from the first query where an attendance identified by PSEUDO_HESID, PROCODE5, ATTENDKEY and ATTENDDATE has joined to more than 9 different subsequent attendances. (Including itself as the question was, having 10 or more attendances in a period).

The results query itself returns a count of all the distinct return attendances (note that the ATTENDKEY is ATTENDKEY2, the second attendance), and distinct patients from SETX where the first attendance is one of those in SETY.

Contact us

This guidance will evolve over time based on feedback received from users.

Please contact the NHS Digital Contact Centre at enquiries@nhsdigital.nhs.uk or 0300 303 5678 for enquiries relating to the Data Access Request Service, Data Sharing Agreements or Data Sharing Framework Contracts.

Please contact the NHS Digital National Service Desk at ssd.nationalservicedesk@nhs.net or 0300 303 5035 for technical or other issues and requests related to using DAE and accessing the data within it.

Useful links

Last edited: 27 November 2019 3:40 pm