Skip to main content
Creating a new NHS England: Health Education England, NHS Digital and NHS England have merged. More about the merger.

Part of NHS Digital annual report and accounts 2018-19

Our delivery directorates: 2. Data, Insights and Statistics

Current Chapter

Current chapter – Our delivery directorates: 2. Data, Insights and Statistics

NHS staff in labratory doing medical tests

In 1948, the written information held in an average patient’s records amounted to a few kilobytes of data.

Today, the convergence of genomics, biosensors, smartphone apps, electronic patient records and a modern digital infrastructure is creating an explosion in the data available to inform individual treatment and care. By 2024, the NHS expects to have sequenced the genomes of half a million people. That alone will amount to about 1.5 petabytes of data.

Wearable and mobile technologies have the potential to allow millions of members of the public to continuously provide personal health data. At the same time, artificial intelligence, modern data science and clinical decision support systems will transform our ability to process and put this information to work.

Our Data, Insights and Statistics directorate’s purpose is to help the NHS, social care and the research and life sciences communities to learn from every patient who is treated. By linking our collections, we can chart a patient’s journey through the health system. When the journeys of many patients are aggregated, we can gain valuable insights into what works well and what doesn’t. We can make better decisions, predict future events, learn from high-performing teams and understand the spread of disease.

Accurate, accessible and timely information allows individual members of the public to manage their health and conditions. It underpins democratic accountability and is essential to some of our country’s most innovative companies. It helps commissioners target limited resources so they have maximum impact and helps clinicians, social workers and researchers improve people’s lives.

For our data to achieve this impact, it must be both trusted and useable. We are accountable to the Office for Statistics Regulation for the independence, quality and value of the statistics we publish. They often intervene publicly on topics of importance and recently commended our approach to drawing together A&E data from the four nations of the United Kingdom. We put users’ needs at the heart of all our work and communicate in a wide variety of formats including in-depth annual reports, interactive monthly dashboards, easy-read summaries, press notices and social media graphics.

Our analytical hubs for primary care, mental health and social care provide users with dedicated information portals that bring  together all our information about these vital topics and allow them to produce additional reports and analyses relevant to their particular needs.

We have significantly improved the ‘searchability’ of our publication website, providing much better access to a 20-year history of open data and statistical publications. We are one of the world’s largest producers of open health data.

And we are continuously improving the interactivity and accessibility of publications. Our analyses of medication safety, emergency care throughput and flu incidence after hospital stays, for example, all used modern business intelligence tools to bring complex data to life and allow customers to flexibly interrogate and visualise the information.

Natural language processing gives us an exciting opportunity to further democratise access to data. It allows users to pose questions in standard English and have a computer application take them through sometimes complex choices.

We have developed a chatbot with natural language processing capabilities to engage customers in a conversation to pinpoint the data and information they need and serve it to them.

A second tool will help customers interrogate structured data sources without requiring specialist analytical skills. Both applications were successfully trialled over the past year. They will be implemented across relevant products in 2019.

We published 265 official statistics publications in 2018-19 and are continuously improving the information they provide.

For example, the new survey of children’s mental health highlighted the prevalence of mental health conditions among young people and, specifically, among older teenage girls, heavy social media users and those identifying as LGBT.

Our new monthly statistics on GP practice appointments are not only reporting the overall load on the system but also shedding light on waiting times to see a doctor and wasted appointments when patients don’t attend.

We published new data from the Breast and Cosmetic Implant Registry, which has been set up to allow swift tracing of individuals if products are recalled. And our National Data Opt-out data reports are giving the system a clear picture of the take up of the opt-out in different parts of the country. The world-renowned Health Survey for England included analysis of the relationship between parent and child obesity for the first time this year.

Linking datasets together to show interactions through the care pathway and patient outcomes is an increasingly important part of what we do. For example, we have joined up secondary care data with community prescribing data to look at admissions of patients after receiving specific medications, offering a vital new insight into patient safety.

Last year, customers told us that they used our data to improve health and care, support research, inform policy and planning, drive efficiencies and improve their own data capabilities.

Academic experts and users are helping us improve the Summary Hospital-level Mortality Index (SHMI), which reports on mortality at site level across the NHS in England. The index is the ratio between the actual number of patients who die following hospitalisation at a trust and the number that would be expected to die given the characteristics of  the patients treated there. We have delivered presentational improvements to the SHMI homepage and are now producing quarterly statistics about a month faster than at the start of 2018. During 2019, we expect to move  from quarterly to monthly publication, increase the detail and accessibility of our reporting and make a series of methodological changes, including adding seasonal factors and better models relating to specific conditions.

Our strategy for improving the quality of data is focused on improving clarity about the standards required, providing immediate feedback to providers when there are quality issues, and supporting system-wide action to improve practice.

Data quality is in the NHS Standard Contract for 2019-20. A proportion of healthcare providers’ income is now conditional on improving quality standards (through the commissioning for quality and innovation, or CQUIN, framework) and there are new indicators supporting improvement.

Our new Data Processing Services, to which we are moving all of our data collections, will automatically produce data quality reports for providers at the time of submission.

We are also working with the Professional Records Standards Body to encourage good and timely professional record keeping by clinicians and other front-line  professionals.

The Virtual Data Science Centre, established by NHS Digital in 2017, brings together data science leaders from 15 public sector organisations. It supports the development and sharing of best practice across government and, during 2018-19, worked to standardise data science job profiles between partners and develop approaches and capability in natural language processing, text mining, and machine learning.

Closer to home, we continued our sponsorship of the Open Data Institute in Leeds, which is at the heart of a growing community of data producers and users with a shared commitment to using open data in the North of England.

Did you know box on statistical publications

Key area: Life sciences and research support

The NHS Long Term Plan puts support for medical research and the life sciences in the UK at the heart of the 10-year strategy for health and care in England.

Medical and life sciences innovation is not only one of the most dynamic sectors of the UK’s economy, vital to the economic growth that pays for our health and care system; it also directly supports better treatments and outcomes for individuals.

The Long Term Plan underlines the UK’s outstanding capabilities in these areas: “Our universities and science base, leading NHS providers, genomics programme and the UK Biobank… combined with better data infrastructure have the potential to lock in the UK as a  global force in data-driven scientific advances in healthcare”.

NHS Digital is playing its full part, as the data custodian for the health and care system, in supporting this success story. The Life Sciences Sector Deal 2, published in December, underlined our role in improving data infrastructure and the Data, Insights and Statistics directorate is reshaping its teams, architecture, and partnerships to meet the challenge.

In February 2019, we joined with NHS partners, Genomics England, Health Data Research UK (HDR UK), Public Health England and the Clinical Practice Research Datalink to establish the UK Health Data Research Alliance. Its purpose is to enable faster and more efficient access to data for large-scale and innovative research projects, while ensuring individuals’ privacy and data sharing preferences are always respected.

We are also working directly with HDR UK, which represents 22 of the UK’s leading health research institutes, to create new opportunities and resources for their members.

Top of the agenda is rationalising access across UK data controllers. NHS Digital is now the controller for all mortality data. This simplifies access to this information through our Data Access Request Service and allows longer access agreements to be signed.

How are we supporting the UK's life sciences sector?


  • provision of data for algorithm testing and AI tools
  • setting core technology and data standards for NHS IT
  • investing £43m to improve core data services
  • secure,  online Data Access Environment allows external customers to access data remotely
  • better support for research through planned trials and faster ethical approvals
  • the public will be able to register their interest in trials through the NHS App

    illustration of support for life sciences

We are working with the Office for Life Sciences to agree a Life Sciences Direction that will make the assets NHS Digital holds for direct care available for research on an anonymised basis, including pathology messaging, prescribing and other data.

In December, we co-hosted the ‘Health and Care data: Improving lives through research’ conference with HDR UK and the National Institute for Healthcare Research in Leeds. We are collaborating with HDR UK on developing new resources, such as a standard GP dataset and community dispensing data.

Data access operates in a complex regulatory environment, including the Health and Social Care Act 2012, Care Act 2014, Statistics and Registration Act, Data Protection Act, the General Data Protection Regulation (GDPR), and the National Data Opt-out.

We are working through our external Research Advisory Group and the Office of Strategic Coordination for Health Research with partners including Medical Research Council (MRC), the Health Research Agency, HDR UK , and the Life Sciences Industrial Strategy Implementation Board to streamline legal and ethical approvals across the system, cut bureaucracy and duplication, and support a consistent understanding by researchers of the rules governing the use of health data in research.

Our Data Access Request Service (DARS) processed more than 1,145 agreements for person-level data over the past year, an increase of 65% from 2017-18 and 120% compared with 2016-17.

We increased our informatics support for Genomics England’s 100,000 Genomes project, which announced in December that it had reached its 100,000 goal, and we are helping the UK Biobank securely link GP data to their 500,000-strong cohort.

The ORION-4 phase 3 clinical trial at the University of Oxford is aiming to find out if a new cholesterol-lowering injection safely reduces the risk of heart attacks and strokes.

Our data has underpinned the selection of the cohort for this major clinical trial and we are working with Oxford and NorthWest EHealth, backed by Medical Research Council and Digital Innovation Hub Sprint exemplar funding, to develop proofs of concept of a national service to help researchers plan trials and identify participants more quickly and reliably.

Despite the increase in workload and the introduction of GDPR, DARS achieved an average wait time from data application to data delivery of 80 days for new applicants between April and December 2018, compared to more than 140 days in May 2016. Our fastest application was approved in 21 days.

did you know illustration about recruitment for orion 4 clinical trial

Key area: Data transformation and architecture

We are investing £43m in new Data Processing Services (DPS) to improve the ways we collect, process and use data.

The platform was introduced in May 2019 and all our data collections will be moved onto it in the next two years. At its heart, the project is about merging datasets under a single data architecture. Rather than operating large, independent data sets with distinct submission procedures, dissemination methods and standards, DPS takes flows of data from submitting institutions and combines them in a flexible and modular way so that one collection can support a variety of different uses.

By using automated processing and modern cloud technology, we get information faster and in more flexible formats, improve privacy and data security, but also, crucially, reduce the burden of submissions on the NHS and social care. Institutions no longer have to submit the same information multiple times for different uses.

Cutting-edge privacy technology ensures that the platform is also highly effective at protecting individuals’ sensitive data.

Identifiable information is systematically and automatically removed from data sets without compromising the quality of the data for users.

Our encryption method, developed with London-based software company Privitar, allows strict control over how personal data can be linked. For example, if a user needs data to be linked using individuals’ NHS numbers, the DPS automatically replaces that number with a random value that cannot be tracked back.

To further protect privacy, a different random value is created each time data is made available, so there is no possibility of cross-referencing data provided for different purposes.

This ‘double lock’ encryption method means that nobody with access to the data set can use it to find personal details.

Another major improvement is our new Data Access Environment, which provides a secure, online portal for specifically authorised users to access data without the original data ever having to be disseminated outside of NHS Digital.

The environment allows us to ensure that only users who have proved a legal basis for using data can access it. It also allows those users to get better linked data more quickly, to use powerful tools for analysis and visualisation that are built in and to harness the power of cloud-based technology to easily increase the computing power they can apply to large data sets.

In the future, we expect to add functionality including the ability for researchers and local organisations to ‘bring their own data’ for linkage and analysis.

We have also launched, with NHS England, a new integrated data architecture approach that is supporting system-wide collaboration on data definitions and standards. This provides a common forum for standards to be proposed, consulted on and publicised.

With NHS England colleagues, we have developed integrated information flows and architecture patterns across the Data Processing Services, Local Health and Care Record Exemplars (LHCREs), GP IT Futures, and the Summary Care Record (SCR). These are in line with the first draft of the Longitudinal Care Record elements and will serve as the foundation for fully integrated information sharing across the NHS.

We worked with the Government Digital Service to launch a single location for data registers, the live lists setting out the approved versions of different types of data across the NHS (for example, organisation codes, diagnosis codes, and postcodes). The list is dynamically maintained in a machine- readable format and was launched in public beta in January 2019.

An application programming interface (API) allows users to tap the registers for use in their own databases. Data registers on organisational data services, health resource groups and English indices of deprivation were added during 2018-19. We expect to develop this resource further and take it out of beta in 2019-20.

We are also leading the collective effort to improve the collection of data across the health and care system. A major part of this work is rationalising health and care collections.

In 2018-19, we prepared for the migration of data assets currently held by Public Health England to NHS Digital in line with the recommendations of the McNeil Review.

The new Strategic Data Collection System has now completely replaced UNIFY, the legacy system previously used by NHS England and hosted by the Department of Health and Social Care. This has presented us with the opportunity to retire 13 data sets and further work to remove duplicate data sets is underway. This not only improves the quality and integration of the information available to the health and care system but reduces burdens on providers of data.

We also continue to improve the content and coverage of key data assets. We worked with NHS England and the Royal College of Emergency Medicine to produce the new Emergency Care Data Set. We also worked with NHS Improvement on Patient Level Information and Costing Systems, and with NHS England on the Mental Health Services and the Community data sets.

In collaboration with our partners in the Private Healthcare Information Network (PHIN), we will shortly start consultation on bringing data collection and measurement of private healthcare within the scope of NHS systems and standard. This will address concerns regarding the lack of knowledge about quality in private care and improve the completeness of patient records where some or all care has been received privately, an issue highlighted by the ongoing inquiry into breast surgeon Ian Paterson.

Building a flexible, cloud-based data architecture

Our datasets record billions of patient interactions with the health and care system a year. We are moving this data to a cloud-based infrastructure that makes it easier for users to get joined-up information.

We are producing more bespoke data products, linking datasets to meet customers’ specific needs.

Case study: ORION-4 trial, University of Oxford

An innovative use of NHS Digital data is helping Professor Louise Bowman to provide clinical trial evidence for a potentially lifesaving new treatment for cardiovascular disease.

Louise Bowbman, university of Oxford

Louise is leading the first major international research study to use hospital admissions data directly from NHS Digital to identify potential participants.

That means the ORION-4 study at the University of Oxford is much more efficient – and it paves the way for other studies to use the same streamlined process.

“By being able to start the study quickly, we can finish it quickly and we can get the answers much faster, while preserving the high quality that all studies must have,” she says.

“We hope that the end result will be proof that a new treatment for cardiovascular disease has the potential to save the lives of many people in the future.”

ORION-4 is testing whether inclisiran – which, while not yet approved for use in any market, has already been shown in phase 2 trials to significantly reduce ‘bad’ cholesterol in the blood – cuts the chances of patients having heart attacks and stroke. The study will recruit 15,000 people who have previously had cardiovascular events, including 12,000 from the UK – which means inviting at least half a million people.

In the past, researchers would ask up to 100 hospital trusts to identify suitable patients from their records. But this was very time-consuming for trusts and for the study team receiving the data.

“Now, we are getting those data direct from NHS Digital,” says Louise.

“We can access one single dataset, which means we can then rapidly turn that around into invitations to potential participants to enrol in the study.

“That’s a huge improvement on the previous system.”

Not only does this accelerate research, it saves valuable NHS resources and significantly cuts the cost of clinical trials.

“If we can make trials more efficient and less costly which this is a major contributing factor to then we could potentially test many more agents and develop more new therapies, which could have huge benefits to the health of the nation.”

This study fits in with a national drive to find ways to better use data to improve the speed, efficiency and quality of clinical trials.

Last edited: 13 January 2020 2:35 pm