Building healthcare software - clinical coding, classifications and terminology

Overview

This guide explains how to build software that deals with clinical coding, classifications and terminology within the NHS in England.

It is a non-technical guide, aimed at people building healthcare software, including:

product owners
architects
business analysts
delivery managers
software engineers

It covers the following topics:

what clinical coding, classifications and terminology are
what national services and APIs are available for your use case
the software delivery process
getting started

This guide is part of our series of domain-specific guides on building healthcare software.

For more context, also see our introduction to healthcare technology.

If your interest in clinical coding, classifications and terminology is not from a software development perspective, see the website for users of our products, known as Delen.

Clinical coding

Clinical coding means applying unique and precise ‘codes’ to various aspects of patient care. These can be 'classification' or 'terminology' codes which we look at later.

For example, ‘disorder of lung’ is a condition which is sometimes referred to ‘lung disorder’, ‘pulmonary disease’, 'lungsjukdom' (Swedish) or 'lungesygdom' (Danish). If we use a single code, such as ‘19829001’, to represent this condition, we remove the ambiguity of different terms used by different people (or languages).

In the example, ‘19829001’ is the clinical code for ‘disorder of lung’ in a coding system called SNOMED CT (UK edition), which we look at later.

Clinical codes can describe a wide range of diseases, illnesses or conditions, symptoms, clinical procedures or interventions, human anatomy and medicines.

The benefits of clinical coding are that:

when using IT systems, healthcare workers can search for terms, or pick them from a list, or autocomplete them as they type
IT systems can incorporate clinical decision support rules to alert healthcare workers as quickly as possible, for example, to the early signs of sepsis in a patient
it avoids any confusion due to use of different or unfamiliar language, for example when transferring a patient from one care setting to another
it allows us to analyse aggregated data for trends, for example the effectiveness of a treatment for a specific condition, or the consistency of care across different organisations
it supports the way NHS organisations are funded – acute trusts are ‘paid’ for the number of procedures they perform of each type

Classifications and terminology

There are two different types of clinical coding - classifications and terminology.

Difference between classifications and terminology

Classifications

Classifications are about giving specific codes to 'groups' of illnesses, symptoms, or procedures.

For example, ‘I50.0’ is the ICD-10 code for ‘congestive heart failure’.

ICD-10 (International Classification of Diseases, version 10) is a ‘classification system’ - it has codes for groups, or classes, of diseases.

Classification codes are useful when we are not interested in the individual patient, their diseases, symptoms or procedures, just in how they group together with similar patients. For example:

when doing statistical analysis on aggregated data
when claiming reimbursement funding for procedures performed

Terminology

Terminology is about giving specific codes for each 'individual' illness, symptom, procedure or medicine. The focus is on what makes individual patients different from one another, not what they have in common as for classifications.

For example, ‘15629591000119103’ is the SNOMED-CT code for ‘congestive heart failure stage B due to ischemic cardiomyopathy’ and ‘16838951000119100’ is the code for ‘Acute on chronic right-sided congestive heart failure’. These are indistinguishable in a classification like ICD-10 as simply code ‘I50.0’.

SNOMED-CT is a ‘terminology system’ - it has codes for each and every illness, event, symptom, procedure, test, organism, substance and medicine.

The net result is that a single episode of patient care might produce only 3 or 4 classification codes, but the same episode could produce dozens or hundreds of unique terminology codes.

Terminology codes are useful when we need to convey exactly which illness, symptom, procedure or medicine we are talking about, for example when prescribing personalised medicine.

Terminology codes are also hierarchical with complex relationships, as shown in the diagram, and separate codes for the different level of terms.

Classifications and terminology working together

Classification codes pre-date terminology codes, by several hundred years.

Traditionally and still today, specially trained people known as ‘clinical coders’ in care settings are responsible for assigning classification codes to episodes of patient care. To do this, they'd need to read through the (paper or electronic) patient care record, usually after the patient has finished being treated. They translate the clinical terminology into classification codes which are stored as a summary of treatment in the patient's record.

This data was first used for population-based statistics, then later development of new health policies, monitoring of their implementation, and also for reimbursement funding purposes.

The more modern approach is that:

Clinicians assign specific terminology codes at the point of care, during the patient's illness, primarily to enable decision support systems to make evolving treatment recommendations.
Whenever needed for statistics or billing, IT systems assign classification codes based on pre-defined mappings from terminology codes to classification codes - typically after the care episode has ended.

Mapping several terminology codes to a single classification code

In some care settings, such as secondary care, clinical coders still have a prominent role in making informed decisions regarding the more complex types of classification mapping. In others, such as primary care, classification systems have never been used at all.

Coding systems

A clinical 'coding system' is a specific set of clinical codes on a specific topic, curated by a specific organisation.

A coding system might be global or it might be local, for example applying just to the UK or England.

There are a number of different coding systems in use in the NHS in England, as follows:

Coding system	Type	Scope	Geography
SNOMED CT	Terminology	Comprehensive	Global, but with a UK edition
dm+d	Terminology	Medicines	England, but UK usage
ICD-10	Classification	Statistics, reimbursement	Global
OPCS-4	Classification	Statistics, reimbursement	UK
UCUM	Terminology	Units of measure, pathology	Global
NICIP	Terminology	Radiology, imaging	UK
Read Codes	Terminology	Clinical thesaurus	Retired - replaced by SNOMED CT and dm+d

SNOMED CT

SNOMED CT (originally Systemised NOmenclature of MEDicine - Clinical Terms) is a structured clinical vocabulary for use in electronic health records. It is the most comprehensive and precise clinical health terminology product in the world, with extensive worldwide adoption.

SNOMED CT is mandated as an NHS fundamental information standard (SCCI0034) in NHS England. The 2023 UK edition of SNOMED CT contains the 357,000 globally common codes, plus the UK clinical extensions. These provide 35,000 codes required only in the UK, such as for UK screening procedures, assessment scales, plus 378,000 codes for drugs and appliances that can be prescribed in the UK, and British English terms for all these.

SNOMED CT uses ‘concepts’ to represent clinical thoughts, for example '56265001 Heart disease (disorder)' and '80146002 Excision of appendix (procedure)'.

Every concept has a unique numeric identifier, called a ConceptId. The concepts are arranged in hierarchies from the general to the more detailed, allowing the data to be recorded and accessed at different levels depending on how it's used.

Some of the top levels of the SNOMED CT hierarchy include:

Level	Sub-level	Example
clinical findings	disease and deformity	a scar
	symptoms	difficulty breathing
	social	use of walking aid
	examination findings	tachycardia
causes of disease	forces	pressure change
	events	traffic accident
	organisms	herpes simplex virus
procedures	laboratory	haemoglobin electrophoresis
	therapy	radiotherapy
	clinical investigation	cardiovascular investigation
	surgical procedure	nephrectomy
anatomy	normal	knee joint
	abnormal	ganglion cyst
	lesions	bony callus
observations	vital signs	blood pressure
	body product observable	colour of urine
	values	present or absent
products	drugs	paracetamol

You can view the full hierarchy with the SNOMED CT Browser.

For more details on SNOMED CT, see the Terminology and Classifications SNOMED CT Delen page.

dm+d

The dictionary of medicines and devices (dm+d) is a dictionary of descriptions and codes which represent medicines and devices in use across the NHS. It is the NHS standard (SCCI0052) and therefore must be used when communicating information about medicines between electronic systems. dm+d gives clinical IT systems a single shared language of medicines which makes exchanging information between systems easier, safer and more accurate.

All unique identifiers used in dm+d are SNOMED CT codes. To support users implementing SNOMED CT structures for clinical terming NHS England take the core dm+d data (XML file format) and transform it into a fully compliant SNOMED CT release (‘RF2 format’) with added information to support (primarily) secondary care users.

There are some differences in the content of dm+d and the SNOMED CT UK Drug Extension and the timing of updates to them (weekly vs 4-weekly). This means that data sent between primary and secondary care cannot always be processed by the IT systems that receive it. This can lead to transcription and/or clinical errors for patients. In recognition of this, several changes are being made to dm+d and SNOMED CT UK Drug Extension to make these two terminologies more closely aligned in terms of content and availability.

For more information see UK Medicines Terminology Futures

You can browse dm+d with the NHS dm+d browser.

ICD-10

The World Health Organization (WHO) International Classification of Diseases (ICD) is the global standard which categorises and reports diseases to compile health information related to deaths, illness or injury worldwide.

ICD version 10 (ICD-10) is also an NHS approved fundamental standard (SCCI0021).

All inpatient episodes and day cases that contain diagnoses must be recorded to this mandated version, ICD-10.

ICD-10 uses about 18,000 four or five-character alphanumeric codes, for example 'I50.0 Congestive heart failure', or ‘T85.8 Other complications of internal prosthetic devices, implants and grafts, not elsewhere classified’.

ICD-10 codes are also arranged in hierarchies, but much simpler ones than SNOMED CT, and are static to enable comparative analysis over time. There are standards, rules and conventions to ensure the consistent choice of ICD-10 codes by clinical coders, and the production of comparable data.

Increasingly in practice, especially in primary care, clinicians record clinical information relating to the direct care of the patient, using many SNOMED CT codes. A clinical coder then summarises the care episodes using a few ICD-10 classification codes. To assist, we provide approved one-way mappings from SNOMED CT to ICD-10.

ICD-10 coded data is essential to statistical analysis and financial reimbursement initiatives, including Hospital Episode Statistics (England) , Hospital Admitted Care Activity and and the National Tariff payment system.

OPCS-4

The Office of Population Censuses and Surveys - Classification of Surgical Operations and Procedures 4th revision (OPCS-4) is the principal classification of interventions and surgical procedures for use in the NHS, developed and maintained by NHS Digital.

OPCS-4 is also an NHS approved fundamental standard (DAPB0084).

The current mandated version is OPCS-4.9 as of 1st April 2020, which is due to change to OPCS-4.10 on 1st April 2023.

OPCS-4 uses about 11,500 four-character alphanumeric codes, arranged in a simple hierarchy, for example 'M72.1 Partial urethrectomy' or ‘J46.1 Percutaneous dilation of anastomosis of bile duct and insertion of tubal prosthesis, however further qualified’.

Along with ICD-10, OPCS-4 coded data is also essential to statistical analysis and financial reimbursement initiatives, including Hospital Episode Statistics (England) and the National Tariff payment system.

For more details on OPCS-4, see the Terminology and Classifications OPCS-4 Delen page.

UCUM

Unified Code for Units of Measure (UCUM) is a code system designed to represent the units of measure used in science, engineering, and business.

Its purpose is to enable unambiguous communication and processing of 'quantities', where a 'quantity' is represented by a number and a unit of measure (UoM), for example 10m means 10 metres, and 1000cm means 1000 centimetres, which represent the same distance.

Separation of the number and the UoM is needed because historically the two were often combined in free text, or the UoM was implicit and not stated.

Even so, UCUM codes can be more computer friendly than human readable, for example, UCUM code pmol/(8.10^8) is 10¹⁶ different to pmol/8.10^8 without the difference made by the brackets being obvious to a human.

UCUM is a standard specification, published by and copyright of the Regenstrief Institute, with its own web site at https://ucum.org.

It consists of:

a formal syntax specification for UoM expressions
a comprehensive data set describing a system of units that defines over 300 foundational unit "atoms" plus the mathematical relationships between them

Defining UoM in this way enables:

data quality improvements - via automatic checking of UoM rules
safer processing - making systems aware of UoMs, calculations and conversions

UCUM is widely used in pathology and diagnostics. For more information about use of UCUM in the NHS, see the pathology and diagnostics UoM site.

NICIP

The National Interim Clinical Imaging Procedure (NICIP) code set enables accurate ordering of medical imaging procedures and direct commissioning of services. Around 4,900 five-character NICIP codes provide a common, consistent and unambiguous representation of imaging procedures, for consistent recording and sharing of information. It is generally used with DICOM images in imaging or radiology information systems.

All NICIP codes have a mapped SNOMED CT equivalent, although sometimes multiple NICIP codes map to the same SNOMED CT code.

Updates and releases to NICIP were put on hold in May 2021, pending an investigation into replacing it with SNOMED CT, and there was a final release to update NICIP to SNOMED CT mappings in October 2022.

For more details about NICIP, see the NICIP Terminology and Classifications Delen page.

Read Codes - retired

Read Codes Clinical Terms (V2 and CTV3) are a coded thesaurus of clinical terms, now retired, but previously used extensively by clinicians to record patient findings and procedures in health and social care IT systems across primary and secondary care.

There are two versions of Read Codes:

V2 had its last release on 1st April 2016
CTV3 had its last release in April 2018

Organisations that previously used Read Codes must now use dm+d for medicines and SNOMED CT for capturing new clinical content.

In most cases, any clinical data previously captured using Read Codes has been mapped to an equivalent SNOMED code, but the historical data should still be kept in its original Read form in case it needs to be referred to.

For more details on Read Codes, see the Terminology and Classifications Read Codes Delen page.

Clinical coding in healthcare software

Healthcare software uses clinical codes in a number of ways, such as:

to present a list of codes to a user so they can select the right one
to 'auto complete' a clinical term or classification while a user is typing
to present coded information back to a user
to map a clinical code from one code system to another – for example from SNOMED-CT to ICD
to send coded information to, or receive information from, another system, for example using the FHIR interoperability standard

All these use cases require fast and reliable access to the relevant clinical codes. There are two main approaches to this:

Have a local copy of the relevant codes, and keep it up-to-date.
Look up clinical codes in real time from a terminology server, which will always be up-to-date.

The approach you use depends on how quickly and frequently you need to access the codes. We provide a variety of national services and APIs that support both approaches, as explained below.

National services and APIs

We provide several user interfaces and three APIs that deal with classifications and terminology:

The following table explains which user interface or API to use for specific use cases.

Use case	Channel	Guidance
Browse SNOMED CT hierarchy of terminology.	User interface	Use the SNOMED CT Browser.
Browse the dm+d dictionary of medicines and devices.	User interface	Use the NHS dm+d browser.
Perform create, read, update and delete (CRUD), or custom Remote Procedure Call (RPC) style operations: via transactional requests on classification or terminology resources defined with the FHIR terminology module available 24 hours a day, 365 days a year with or without access controls	API	Use the Terminology Server FHIR API. Use either a registered account, or more limited unregistered access.
Batch execution of custom operations: available 24 hours a day, 365 days a year with or without access controls	API	Use the Terminology Server FHIR API with POST /Bundle. Use either a registered account, or more limited unregistered access.
Syndicate classification and terminology data models representing a set of concepts within a domain and the relationships among those concepts (ontologies) to other ontology servers: available 24 hours a day, 365 days a year with or without access controls	API	Use the Terminology Server FHIR API. The Terminology Server acts as the master, and content is synchronised to other mirror ontology servers. The syndication feed itself is openly available without authentication, but copyright-protected content still requires authentication.
Management of classification and terminology ontologies: by approved NHS staff and partners only	API	Use the Terminology Server FHIR API, using tooling such as Snapper, which in turn uses the Admin API under the covers.
Synchronise local mirror copies of national classification and terminology reference data sets, on a regular update schedule: available 24 hours a day, 365 days a year with user registration	API	Use the Technology Reference Update Distribution (TRUD) API to automate the download of classification or terminology related TRUD release files, which you can then search locally.
Get a point in time view of reference data including SNOMED CT and dm+d terminology, further back in time than provided by TRUD: without needing guaranteed 24 hours a day, 365 days a year availability with no access controls	API	Use the Data Registers Service - REST API - although it's also possible to do this by contacting TRUD.

If you cannot find the use case or API you're looking for, contact us.

The software delivery process

How you deliver your software is up to you. But there are some important things you need to plan in when building software for the NHS in England, such as clinical safety, security and information governance.

For more details, read our introduction to healthcare technology.

Getting started

To get started with building your software and using our API services and standards, see getting started.