Skip to main content

Understanding data quality in SDCS Cloud

Understanding the data quality reports available in SDCS Cloud and how to identify and address data quality issues.

Data quality issues can vary in scale and impact, from invalid data to submission failures. This document is designed to help users understand the data quality reports available in the SDCS Cloud portal and identify and address data quality issues. It will help data senders

  • understand how to navigate and download the Data Quality reports
  • understand validation error types such as file-level, record-level and group-level rejections.
  • familiarise themselves with some common issues such as invalid formats and that cause validation and data quality issues

Data quality

After submitting your data file to SDCS cloud, you can check to see if there are any data quality issues with your submission via the Summary Reports available on the Submission History page.

Referring to the NHS Data Dictionary or the corresponding Technical Output Specification (TOS) for your data set can help users to address data quality issues. To access the correct TOS, visit the webpage for your appropriate data collection, listed under the Data Collections section of the SDCS Cloud webpage


Submission history

The Submission History page is accessed from the navigation bar at the top of the SDCS Cloud Home screen. It provides details of previous submissions for collections and organisations that you are associated with. Note that submissions that did not complete the upload stage will not appear in your submission history.

Submission history

Check your Submission Report

Clicking View under the Action column on the Submission History page will open the Check your Submission Report page for that submission. All Reports and Extracts are available under the Available Extracts section.

Submission report

Rejected-DQ status

A submission with a status of ‘Rejected-DQ’ will generate Summary Reports only. Pre-Deadline extracts will not be available for that submission.

Rejected DQ Status


Summary reports

Summary Reports are designed to help assess the quality of the submitted file so that data senders can determine whether they need to make a further submission. Files can be downloaded in JSON or CSV format.

Depending on your data set you will have two or more of the following Summary Reports

 

Data Quality Report (All Data Sets)

The Data Quality Report lists validation warnings with a reference to the affected table, a message, a code, and the associated data items. The table and data items can be used to identify the specific row against which the validation warning has been raised.

Records with warnings will still be loaded on submission provided there isn't an overriding validation failure for the same record. More information can be found by referencing the validation code located in the Technical Output Specification for the data set.

 

Validation Report (All Data Sets)

The Validation Report lists validation failures with a reference to the affected table, a message, a code, and the associated data items. The table and data items can be used to identify the specific row against which the validation failure has been raised.

Validation failures may reject the entire file, groups of records (that depend on a parent row that has already been rejected) or individual records. More information can be found by referencing the validation code located in the Technical Output Specification for the Data Set.

 

Diagnostic Report (Specific Data Sets)

The Diagnostics Report has been designed to provide an early indication of the validity of some routine data items, data quality measures and elements of national indicators. These diagnostics will assist the user in checking the accuracy of the overall submission and assessing whether data quality issues exist and a re-submission is required.

An explanation of the Diagnostic codes in the report can be found under the Diagnostics tab in the Technical Output Specification for the data set. 

 

Aggregation Counts Report (Specific Data Sets)

The Aggregation Counts Report has been designed to provide an early indication of the volume of data for each table that has passed validations and will be included in post-deadline extracts.

For more details about the report, please consult the Counts tab in the Technical Output Specification for the data set.


Validation error types

File level rejections 

File level rejections are validation errors that highlight specific data quality issues which have caused the whole submission file to be rejected. These checks are made on all the records submitted by the data sender.

 

Record level rejections

Record-level rejections are validation errors that highlight a data issue in a specific column which has caused the whole record to be rejected.

Example

Master Patient index table 

LocalPatientId Postcode       Gender PersonBirthDate EthnicCategory
LPI001 AA1 4AA 1 1990-03-13 A
LPI002 AA1 3AA 21 1999-01-01 AA9

The following record-level rejections will be triggered and Record 2 will be rejected.

IDS00130 - Record rejected-Person Stated Gender Code has an incorrect data format. Local Patient Identifier (Extended)= LPI002

IDS00132 - Record rejected - Ethnic Category has an incorrect data format.  Local Patient Identifier (Extended)= LPI002

 

Group-level rejections

Certain records in different tables may be linked by certain keys (such as LocalPatientId). These are considered ‘groups’ and, to protect data integrity, where one record fails validation, any linked records within that ‘group’ will also fail validation. There are two types of group-level rejections; No valid group submitted and More than one group submitted.

 

No valid group submitted

Reason 1

Records in the GP table (child table) are rejected if the corresponding records from Master Patient Index (parent table) are rejected due to a validation error.

Example

Master Patient Index (parent table) has a one-to-many relationship with GP table (child table).

Master Patient Index (parent table)

LocalPatientId Postcode Gender PersonBirthDate EthnicCategory
LPI001 AA1 4AA 1 1990-03-13 A
LPI002 AA1 3AA 21 1999-01-01 AA9

GP table (child table) 

LocalPatientId GMPCodeReg StartDateGMPRegistration EndDateGMPRegistration OrgIDGPPrac
LPI001 ABC110 2019-01-01    
LPI002 RRP110 2019-01-21    

Record 2 (LocalPatientID =LPI002) from GP table (child table) will trigger the following error because Record 2 (LocalPatientID=LPI002) in Master Patient Index (parent table) has been rejected due to a Record-level rejection (validation error explained in example above).

IDS00220 - Group rejected - No valid IDS001 group transmitted for this Local Patient Identifier (Extended).   Local Patient Identifier (Extended)= LPI002

Reason 2

Records in table B (child table) are rejected if no corresponding records are submitted in table A (parent table).

Example

Master Patient Index (parent table) has a one-to-many relationship with GP table (child table).

Master Patient Index (parent table)

LocalPatientId Postcode Gender PersonBirthDate EthnicCategory
LPI001 AA1 4AA 1 1990-03-13 A
LPI002 AA1 3AA 2 1999-01-01 A

GP table (child table)

LocalPatientId GMPCodeReg StartDateGMPRegistration EndDateGMPRegistration OrgIDGPPrac
LPI001 ABC110 2019-01-01    
LPI002 RRP110 2019-01-21    

Record 1 (LocalPatientID = LPI001) is not present in submitted corresponding table. In this case the following error will be triggered

IDS00220 - Group rejected - No valid IDS001 group transmitted for this Local Patient Identifier (Extended).   Local Patient Identifier (Extended)= LPI001

 

More than one group submitted

A ‘More than one group submitted’ error will be triggered if duplicate records (records with same values in key columns) are submitted. All duplicate records will be rejected with group-level rejection error.

Example

GP table

LocalPatientID GMPCodeReg StartDateGMPRegistration EndDateGMPRegistration OrgIDGPPrac
LPI001 ABC110 2019-01-01    
LPI001 ABC110 2019-01-01    

These two records will trigger the following validation error and both records will be rejected.

IDS00221 - Group rejected - More than one IDS002 provided for this Local Patient Identifier (Extended) + General Medical Practice Code (Patient Registration) + Start Date (GMP Patient Registration) combination.   Local Patient Identifier (Extended)= LPI001 General Medical Practice Code (Patient Registration) = ABC110 Start Date (GMP Patient Registration) = 2019-01-01

 

Warnings

Warnings are where the validation process identifies an issue with the submitted file but does not reject any records.  Warnings advise data senders where there is an issue with the data quality in their submissions.

Example

Master Patient Index table

LocalPatientID Postcode Gender PersonBirthDate EthnicCategory
LPI001 AA1 4AA 1 1990-03-13 A
LPI002 AA1 3AA   1999-01-01 A

As the Gender column is blank for Record 2, the following warning will be triggered:

IDS00129 - Warning - Person Stated Gender Code is blank. Local Patient Identifier (Extended)= LPI002


Common issues

Group-level rejections

Group-level rejections are one of the most common data quality issues, highlighted in DQ reports. Please refer to the section above for more information.

 

Invalid format - record level rejections

In many cases an invalid format record level rejection, the root cause may be due to the way the data was imported in the submitted access database causing the ‘leading zero’ to be dropped (for example ‘2’ submitted rather than ‘02’).

Invalid format example

Records are rejected from a Master Patient Index table with the following validation error:

IDS00115 - Record rejected - NHS Number Status Indicator Code has an incorrect data format.  Local Patient Identifier (Extended)=<LocalPatientId>

In this example, the following codes have been submitted in the rejected records:

1,2,3,4,5,6,7

As per the corresponding Technical Output Specification, the values should be in the following format:

01,02,03,04,05,06,07

Leading zeros can often be dropped when data has been imported from an Excel file into an Access file.

 

Invalid date format

An issue relating to an invalid date format could be due to the way data was imported into the submitted Access database whereby the date format is changed automatically during import.

Invalid date example

Date fields submitted in ‘yyyy/mm/dd’ or ‘dd/mm/yyyy’ format instead of ‘yyyy-mm-dd’.

Date format should be validated before submission to avoid failing validation.

 

Invalid organisation data

Many warnings and rejections can be caused due to incorrect values being submitted for organisation-related data items.

This could be due to an issue at the point of data collection. The legacy Bureau Service Portal (BSP) may have historically allowed expired codes, or in certain cases, it may have accepted certain data without issuing warnings or rejections, meaning that some issues may not have been identified.

Invalid organisation examples:

  • Organisation Site Identifier (Of Treatment) is not present in national organisation reference data tables as a live organisation.
  • Organisation Identifier (Local Patient Identifier) is not present in national organisation reference data tables as a live organisation.
  • Organisation Identifier (Educational Establishment) was not a live organisation in national code reference data tables.
  • Organisation Identifier (Code of Commissioner) provided does not correspond to a current live organisation in national organisation reference data tables.

 

SNOMED CT not in reference table

Errors can be caused due to the incorrect usage of SNOMED CT codes.

For more information about SNOMED codes, please refer to the SNOMED guidance pages.  

 

Corresponding data item validation

In certain cases, where a data item value is populated, a corresponding value is required in another data item field.

Example

ServiceRequestID TeamID_Local TeamTYpe Referral_ClosureDate Referral_RejectionDate Referral_RejectionReason
SRV001 TID001 01 2019-01-01 2019-01-01  

This record will cause the following warning because Referral Rejection Date is populated but Referral Rejection Reason is required but is not populated.

CYP10223 - Warning - Referral Rejection Date is populated but Referral Rejection Reason is blank. Service Request Identifier=<C102902> Local Patient Identifier (Extended)=<C101901>


Resolving data quality errors

Rejection Order

As described in the Validation Error Types section above, errors are grouped where a record rejection in a parent table (e.g. IDS001 MPI) results in errors in all associated child tables (for example IDS002 GP). 

For ‘No valid Group Submitted’ error types the primary errors will be the rejected record in the parent table, and these should be corrected first.

To reflect the above and help with the usability of the Validation Report; errors are displayed by Parent Table and then Child Table as per the order detailed in the Parent and Child table relationships section below for each data set.  

Errors are displayed in the following order for each table

  • file rejections
  • group rejections
  • record rejections
  • warnings

Using IAPT as an example, the report starts with Parent Table IDS001 MPI showing file rejections, group rejections etc. It then repeats in the same order for Parent Table IDS101 Referral and so on, through to the final Parent Table IDS202. The same process is then repeated for the Child Tables.

 

Resubmission

It is recommended that all group and record rejections in the parent tables are resolved first and then the data set be resubmitted to allow the corrections to be validated and any associated grouped errors within the child tables to be removed from the Validation Report. 

In this way the Validation Report will be more manageable. 


Parent and Child table relationships

These tables help to illustrate the Parent – Child relationships.  It allows the reader to quickly identify a corresponding child or parent when navigating the table.

Mental Health Services Data Set (MHSDS)

Maternity Services Data Set (MSDS)

Community Services Data Set (CSDS)

Improving Access to Psychological Therapies (IAPT)

Last edited: 13 July 2022 5:01 pm