Skip to main content

NHS Digital data innovation lab

This page contains information about innovations in development. These may be experimental in nature, or utilise test data or 'management information'. As these products have not yet been brought into business-as-usual, their availability and reliability cannot be guaranteed.

Natural language query

To help break down barriers to the public data held by NHS Digital, the Innovative Uses of Data team are improving access to data by building a natural language query (NLQ) tool. This will sit on our website for anyone to use.

NLQ is a process which transforms a sequence of natural language text, especially in the form of a question, into a query string that can then be run against a corresponding datastore. We are developing a NLQ tool that can understand the intent and context of a question and transform that question into a database query. This can be run on a datastore to return relevant results. The NLQ tool then applies statistical disclosure control to ensure the output is not personally identifiable. The ultimate result is a table and visualisation (graphs, charts) output true to the interpretation given to the user’s question processed in an instant.

Few people have found a reliable way to utilise NLP in order to interrogate structured and coded data sets. Having tried using proprietary generic solutions we found that inconsistencies, technical limitations and difficulty understanding their inner workings have prevented us taking those approaches further. Instead, we decided to build our own natural language processor.

Currently we have developed a working natural language query engine that can demonstrate a capability for taking often quite complex plain English questions and correctly returning data that answers those questions whilst preserving anonymisation principles.

We expect to have a public beta in place for 2019. Any updates will be available on this page.

Benefits of the NLQ tool

These include:

  • improved access to data by for the public
  • faster process of request for data to delivery
  • reduced internal overhead of responding to requests for data
  • improved internal efficiencies for answering requests for data
  • empowering the public to learn about the data by being able to query it and use it
  • stimulating conversation amongst digital healthcare professionals about natural language processing and its uses

Feedback is welcomed - [email protected]

Last edited: 13 December 2018 5:00 pm