Skip to main content

Precision-guided public health

The COVID-19 pandemic saw a quiet revolution in how we use data to target health and care services at vulnerable individuals. John Windell looks at the data platform and algorithms that have helped make the difference.

An older woman sitting at a bus station wearing a surgical mask

As the threat of COVID-19 intensified in March 2020, Chris Whitty, the Chief Medical Officer for England, announced a set of clinical criteria for identifying people likely to be at high risk. 

It included patients who had received organ transplants, those with severe respiratory conditions such as cystic fibrosis, people being treated for cancer and other vulnerable groups. The urgent cross-system plan was to identify all those at risk, tell them about their situation, and ask them to ‘shield’.  

In England, a significant part of that technical challenge, which stretched across several government departments and agencies, fell to NHS Digital. Teams worked day and night to bring hospital and medicines data together and give a view of the vulnerable people who met the predefined conditions. Within about a week, the first version of the 'Shielded Patient List' was in use and saving lives.

In many ways, it was a historic achievement. National data was not just being used in the aggregate to identify trends and guide planning and drive research, it was picking out individuals who needed help and getting guidance and resources to them.

But the Shielded Patient List was also limited: essentially, it was a list of people with the predetermined conditions. It did not seek to identify combinations of factors that might increase or decrease risks. The next question was whether, by looking at a wider range of demographic and medical factors such as age, gender, ethnicity and other underlying health conditions, it would be possible to predict the actual risk coronavirus might pose to any given individual?


As knowledge of the virus began to accumulate, academics at the University of Oxford drew on the best available data to develop an algorithm during the summer of 2020 that clinicians could use to estimate the likelihood that an individual infected with the virus would need hospital treatment or be at risk of dying. Called ‘QCovid’, it quickly became the standard clinical tool for doctors to assess the threat to their patients from coronavirus.

But could QCovid be applied across the NHS at a population level?  A multi-disciplinary team of data engineers, service designers, managers, technical experts and clinicians worked to bring together the data and build a central platform that would produce personalised COVID-19 risk scores.

“Right from the start, this presented several tests, not least meeting the strict legal governance and clinical assurance measures associated with the various sources of medical records,” says programme head Andy Smith.

This had never been done before in so much detail. The sheer number of records was intimidating.

“We looked at the possibilities of using existing datasets and how they could be joined together to present the best possible data view of a patient. This consolidated dataset would give us the best opportunity to identify all the key criteria that would influence the risk scores and so represent the most accurate outcome at the time. This had never been done before in so much detail. The sheer number of records was intimidating.”

Anthony Garratt was one of the leading data engineers tasked with bringing together the data.

“There were hospital records, there were GP records, and then there were records for cancer and other conditions,” he says. “They were all totally separate and disconnected from each other. Each one looked completely different to the next and behaved in a different way. The key challenge was how to look at those in a holistic manner and then create this unified record of each patient's medical history?”


And it was critical that all of the data had to be handled safely and completely within the guidance and regulations governing the use of sensitive patient information. “Everything we did had to conform with sets of information governance and assurance rules that we had to master quickly and rigorously stick to,” says Garratt. “And given the how fast the situation was moving at that time, the scope of the project did change. For example, the rules governing how we would identify a person with a particular condition might shift.”

The team met twice daily for progress updates, at the start of the day and then again at the end. “We would talk about something at 9 o’clock in the morning and then begin planning and working on it,” says Smith. “Sometimes, by the end of the day, we got the message to stop what we were doing because the goalposts had moved.”

Garratt says the pressure was sometimes intense: “All sorts of parameters were still in flux all the way up to what was an immovable deadline.”


New ways of working had to be adopted. “We had to be very reactive, which was unusual for us,” says Smith. “Normally, we would take our time and plan our approach carefully over a much longer period, but we didn’t have that luxury. We had to learn quickly to do things differently.”

One key innovation was embedding expertise that might previously have been consulted externally into the central team. “We had to have specialist advice and guidance on tap,” says Smith. “In those daily meetings we had a lawyer to check the legal position, an information governance expert to assure the data standards, a clinical expert to help us code the various medical conditions. This meant they were all fully up to speed with the various developments and could provide instant feedback. In turn, that meant we could build the central data platform at an unusually rapid pace.”

Bringing legal and IG advice into the centre of the process ensured strict adherence to the rules and helped partners including NHS England, the Department of Health and Social Care, the Cabinet Office and bodies such as the Royal College of GPs to sign off quickly. “These things normally took months, even years,” says Smith. “We did it in weeks.”

We had literally 50 million records in there, and suddenly we had risk scores for all of them.

The core technical breakthrough came with the realisation that the different data sets could be linked by a fundamental common characteristic.

“Starting with just the GP data, we began using their term ‘journals’ to describe patient events,” says Garratt. “A journal is an event that happens to a patient on a given date, for example, a diagnosis or prescription. With the GP processing built, we then looked at hospital data and noticed those records also had events, so, for a particular patient, a diagnosis or a procedure or something else happened on a particular date. To make that work within the same framework, the answer was to mix those events into our scheme for handling GP journal data, making the hospital data look just like it.”

This helped create the holistic end-to-end view of a person’s medical history the team needed.

“We could now run the QCovid algorithm for that individual,” says Andy Smith. “But, of course, the real achievement was that we had literally 50 million records in there, and suddenly we had risk scores for all of them, with the riskiest at the top and the least risky at the bottom. An individual could still go to their GP and get the same score, but the difference was that we could now run a bulk process across the entire population.”

The first task for the risk stratification tool identified those most at risk from coronavirus. They became known as the ‘clinically extremely vulnerable’ group.

“This was important from a policy point of view because shielding was a cornerstone policy," says Smith. “These are the people who were advised to stay at home and shield according to government policy. It was important to identify those cohorts quickly and accurately.”

As the pandemic evolved, so did the tool. The data expanded, the selection criteria became more refined, and the lists more accurate. As viable treatments and vaccines became available, the tool helped get the most vulnerable people were among the first in the queue.


So, with coronavirus still a significant health issue, how is use of the risk stratification tool evolving?

“The ability to identify cohorts of patients is a powerful capability,” says Andy Smith. “At the moment we’re still using it for coronavirus purposes at national level in England, but that capability can also apply locally and regionally. And it could also be used for non-coronavirus purposes, or even non-health purposes. We could work with local authority data, or health and social care data, or occupational health data, and produce patient cohorts for other purposes.”

Once a particular group of people has been identified, whether in a given postcode, city or other area, they can be contacted via text or email. Appointments can be booked and progress tracked. The future ambition is to connect with existing technologies such as the NHS App so that messaging and other communications are as seamless as possible.

We are giving GPs those lists as a way of saving everybody a lot of time.

The most recent application of this process is identifying people for flu vaccinations this autumn. “We are finding the cohorts who might be vulnerable for all sorts of different reasons,” says Anthony Garratt. “We are giving GPs those lists so they can contact people and arrange their flu jabs, hopefully combining that with the latest coronavirus boosters as a way of saving everybody a lot of time.”

The same principle can also be applied to wider uses of the tool, such as finding suitable patients for clinical trials. Other implementations might include public health planning.

“We can take all sorts of real data, anonymise it, and then run different scenarios,” says Smith. "For example, what if this group of people got hit particularly badly by seasonal flu? How many people would it affect? What sort of response might be required? How many critical care beds would be needed? And so on. The ability to run scenarios based on real anonymised data will help emergency and population health managers to improve planning nationally and locally.”

Related subjects

Cohorting as a Service securely identifies groups of people with shared characteristics from national health data.
When the University of Oxford developed a model to predict those at higher risk of dying from COVID-19, we had a challenge: to securely integrate this risk assessment tool into our data systems to identify exactly who these people were. Lee Gathercole, Technical Architect, explains how we did it.
Mark Reynolds, Chief Technology Officer at NHS Digital, explains the architecture behind the NHS Shielded Patient List, which is helping to protect the most vulnerable members of society during the coronavirus pandemic.

Last edited: 9 November 2022 4:47 pm