Skip to main content

Health Research Data Catalogue API

Retrieve information suitable for publication in a health research data catalogue.

Overview

Use this API to retrieve metadata information suitable for publication in a health research data catalogue.

You can:

  • get a list of data sets
  • get data set details

You cannot currently use this API to:

  • retrieve data sets as a bulk transfer
  • retrieve data set feeds

You can get the following metadata information for each data set:

  • characteristics such as: publisher, keywords, coverage, provenance, access, format, standards and data utility
  • logical descriptions of field-level data

The API conforms to the HDR UK Dataset Metadata Schema Standard v2.0.2 created to enable sharing of information with the UK-wide 'federated' health research data catalogue.

API scope

Current scope is limited to metadata information about national health-related data sets (such as description, size of the population contained within that data set and the legal basis for access) that can help researchers and innovators decide whether a data set could be useful to their research and help them to make further health discoveries.


Who can use this API

This API can only be used where there is a legal basis to do so. Make sure you have a valid use case before you go too far with your development. You must do this before you can go live (see ‘Onboarding’ below).


There are no related APIs.


API status and roadmap

This API is initially for use by Health Data Research (HDR) UK developers with other use cases to follow later.

This API is in alpha, meaning:

  • the API is available in our sandbox and integration test environments
  • the API is not yet available for production use
  • we might make breaking changes, but only if we cannot avoid it, and we'll give advance notice

To see our roadmap, or to suggest, comment or vote on features for this API, see our interactive product backlog. If you have any other queries, please contact us.


Technology

This API is RESTful.


Network access

This API is available on the internet and, indirectly, on the Health and Social Care Network (HSCN).

For more details see Network access for APIs.


Security and authorisation

This API is application-restricted, meaning we authenticate and authorise the calling application and we do not authenticate or authorise the end user.

Although we don't authenticate or authorise the end user, for some APIs we do require the calling application to do it 'locally'. For other APIs we don't require the end user to be authenticated or authorised, or even to be present.

We support the following security patterns for application-restricted APIs:

For more details, see application-restricted APIs.


Environments and testing

Purpose Base URL
Sandbox https://sandbox.api.service.nhs.uk/health-research-data-catalogue
Integration test https://int.api.service.nhs.uk/health-research-data-catalogue
Production Not yet available

Sandbox testing

Details of the sandbox environment to follow.

Integration testing

Details of the integration environment to follow.


Onboarding

You need to get your software approved by us before it can go live with this API. We call this onboarding. The onboarding process can sometimes be quite long, so it’s worth planning well ahead.

More details on the onboarding process to follow.


Endpoints

Get a list of data sets

get /datasets

Note: the sandbox API is under construction so Try this API responses don't work in this release.

  

Overview

Use this endpoint to get a summary list of published data sets.

Summary data set metadata returned

The summary data set metadata returned by this search includes:

  • schema conformance
  • data set persistent identifier
  • data set name
  • data set description
  • data set version
  • self continuation link
  • data set metadata creation date
  • data set metadata modification date
  • source of the metadata

Request

Headers
Name Description
apikey

String

API key to authenticate with.

Required
X-Correlation-ID

String

An optional ID which you can use to track transactions across multiple systems. It can take any value, but we recommend avoiding . characters.

Mirrored back in a response header.

Example: 11C46F5F-CDEF-4865-94B2-0EE0EDCC26DA

X-Request-ID

String

A globally unique identifier (GUID) for the request, which we use to de-duplicate repeated requests and to trace the request if you contact our helpdesk.

Must be a universally unique identifier (UUID) (ideally version 4).

Mirrored back in a response header.

If you re-send a failed request, use the same value in this header.

Pattern: /^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$/

Example: 60E0B220-8136-4CA5-AE46-1D97EF59D068

Response

HTTP status: 200

OK

Body

Content type: application/json

Example

Schema

Name Description
object

Container object for all summary dataset resources that match the request.

total
integer
required

The total number of resources that match the request.

Minimum: 1 (inclusive)
items
array
required

Repeats all summary dataset resources (items) that match the request.

object
@schema
object
required

JSON Schema specification that summary dataset resource (item) conforms to.

type
string
required

JSON document type. The JSON dataset is a JSON document, which includes descriptive metadata.

Example: Dataset
url
string
required

Dataset schema URL resolves to version of conformant JSON schema.

Example: https://hdruk.github.io/schemata/schema/dataset/2.0.2/dataset.schema.json
persistentId
string
required

Data set persistent identifier that is common across all data set revisions.

Pattern: ^[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}$
Max length: 36
Min length: 36
Example: 226fb3f1-4471-400a-8c39-2b66d46a39b6
name
string
required

Data set name.

Example: Civil Registration - Deaths
description
string
required

Data set description.

Example: Deaths registration data (all deaths in England and Wales) collected from The Registrar General for England and Wales. Record-level patient dataset, where a record represents one death registration.
version
string
required

Data set version.

Pattern: ^([0-9]+)\.([0-9]+)\.([0-9]+)$
Example: 1.1.0
self
string
required

Self continuation link of a search result.

Example: https://api.service.nhs.uk/health-research-data-catalogue/datasets/dd5f0174-575f-4f4c-a4fc-b406aab953d9
issued
string date-time
required

Data set metadata creation date.

Example: 2020-08-05T14:35:59Z
modified
string date-time
required

Data set metadata modification date.

Example: 2021-01-28T14:15:46Z
source
string
required

The source of the data set.

Example: Other
HTTP status: 4XX

An error occurred as follows:

HTTP status Error code Description
401 ACCESS_DENIED Access token missing, invalid or expired, or calling application not configured for this operation.
404 RESOURCE_NOT_FOUND No dataset resources found.
404 INVALID_ENDPOINT_PATH Invalid endpoint path.
Body

Content type: application/fhir+json

Example

Schema

Name Description
object

Outcome of an operation that does not result in a resource or bundle being returned, for example an error or an async/batch submission. There are a number of possible error codes that can be returned along with a more detailed description in the display field.

resourceType
string

FHIR Resource Type.

Default: OperationOutcome
issue
array

List of issues that have occurred.

Min items: 1
object
severity
string
required

Severity of the error.

Allowed values: fatal, error, warning, information
code
string
required

FHIR error code.

Allowed values: invalid, structure, required, value, invariant, security, login, unknown, expired, forbidden, suppressed, processing, not-supported, duplicate, multiple-matches, not-found, deleted, too-long, code-invalid, extension, too-costly, business-rule, conflict, transient, lock-error, no-store, exception, timeout, incomplete, throttled, informational
details
object

Internal error code.

coding
array
object
system
string

URI of the coding system specification.

Example: https://fhir.nhs.uk/R4/CodeSystem/Spine-ErrorOrWarningCode
version
string

Version of the coding system in use.

Example: 1
code
string

Symbol in syntax defined by the system.

Example: INVALID_VALUE
display
string

Representation defined by the system.

Example: Provided value is invalid
diagnostics
string

Additional diagnostic information about the issue. This information is subject to change.

HTTP status: 5XX

A 5xx status code means the server has a problem. For more details on the most common 5xx status codes and their meanings see the HTTP status codes.

Get data set details

get /datasets/{id}

Note: the sandbox API is under construction so Try this API responses don't work in this release.

  

Overview

Use this endpoint to get data set details for a given data set persistent identifier.

Request

Path parameters
Name Description
id

UUID (uuid)

The persistent identifier of the data set.

Example: dd5f0174-575f-4f4c-a4fc-b406aab953d9

Required
Headers
Name Description
apikey

String

API key to authenticate with.

Required
X-Correlation-ID

String

An optional ID which you can use to track transactions across multiple systems. It can take any value, but we recommend avoiding . characters.

Mirrored back in a response header.

Example: 11C46F5F-CDEF-4865-94B2-0EE0EDCC26DA

X-Request-ID

String

A globally unique identifier (GUID) for the request, which we use to de-duplicate repeated requests and to trace the request if you contact our helpdesk.

Must be a universally unique identifier (UUID) (ideally version 4).

Mirrored back in a response header.

If you re-send a failed request, use the same value in this header.

Pattern: /^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$/

Example: 60E0B220-8136-4CA5-AE46-1D97EF59D068

Response

HTTP status: 200

OK

Body

Content type: application/json

Example

Schema

Name Description
object

HDR UK Dataset Metadata Schema.

identifier
required

System data set identifier (logical identifier of the dataset resource assigned by the Central Metastore). This is not the Dataset persistent identifier that is common across all dataset resource revisions.

anyOf
string
Pattern: ^[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}$
Max length: 36
Min length: 36
Example: 226fb3f1-4471-400a-8c39-2b66d46a39b6
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
version
string
required

Data set metadata version.

Pattern: ^([0-9]+)\.([0-9]+)\.([0-9]+)$
Example: 1.1.0
revisions
array
required

Data set Revisions. Includes the Semantic Version of the data set and URL endpoint to obtain the version.

allOf
object
version
string
required
Pattern: ^([0-9]+)\.([0-9]+)\.([0-9]+)$
Example: 1.1.0
url
string uri
required
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
issued
string date-time
required

Data set Metadata Creation Date.

Example: 2020-08-05T14:35:59Z
modified
string date-time
required

Data set Metadata Modification Date.

Example: 2021-01-28T14:15:46Z
summary
object
required

Summary metadata must be completed by Data Custodians onboarding. metadata into the Innovation Gateway MVP.

title
required

Title of the data set limited to 80 characters. It should provide a short description of the data set and be unique across the gateway. The title can be prefixed with an organisation name or identifier to differentiate it from other data sets within the Gateway. Good titles should avoid acronyms, summarise the content of the data set and if relevant, the region the data set covers.

Example: North West London COVID-19 Patient Level Situation Report
allOf
string
Max length: 80
Min length: 2
abstract
required

Data set Abstract provides a clear and brief descriptive signpost for researchers who are searching for data that may be relevant to their research. The abstract should allow the reader to determine the scope of the data collection and accurately summarise its content. The optimal length is one paragraph (limited to 255 characters) and effective abstracts should avoid long sentences and abbreviations where possible.

Example: CPRD Aurum contains primary care data contributed by General Practitioner (GP) practices using EMIS Web® including patient registration information and all care events that GPs have chosen to record as part of their usual medical practice.
allOf
string
Max length: 255
Min length: 5
publisher
required

Data set publisher is the organisation responsible for running or supporting the data access request process, as well as publishing and maintaining the metadata. In most this will be the same as the HDR UK Organisation (Hub or Alliance Member). However, in some cases this will be different i.e. Tissue Directory are an HDR UK Gateway organisation but coordinate activities across a number of data publishers i.e. Cambridge Blood and Stem Cell Biobank.

allOf
object

Organisation metadata describes an organisation for purposes of discovery and identification.

identifier

Provides a Grid.ac identifier (see https://www.grid.ac/institutes) for your organisation. If an organisation does not have a Grid.ac identifier use the “suggest and institute” function here: https://www.grid.ac/institutes#

allOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
name
required

Name of the organisation.

allOf
string
Max length: 80
Min length: 2
logo

Provides a logo associated with the Gateway Organisation using a valid URL. The following formats will be accepted .jpg, .png or .svg.

allOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
description

Provides a URL that describes the organisation.

allOf
string
Max length: 3000
Min length: 2
contactPoint
required

Organisation contact point(s).

anyOf
string email
array
allOf
string email
memberOf

Indicates if the organisation is an Alliance Member or a Hub.

allOf
string
Allowed values: HUB, ALLIANCE, OTHER, NCS
accessRights

The URL of a webpage where the data access request process and/or guidance is provided. Can support multiple access processes i.e. industry vs academic.

anyOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
array
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
deliveryLeadTime

Provides an indication of the typical processing times based on the types of requests typically received. Note: This value will be used as default access request duration for all data sets submitted by the organisation. However, there will be the opportunity to overwrite this value for each data set.

allOf
string
Allowed values: LESS 1 WEEK, 1-2 WEEKS, 2-4 WEEKS, 1-2 MONTHS, 2-6 MONTHS, MORE 6 MONTHS, VARIABLE, NOT APPLICABLE, OTHER
accessService

Provides a brief description of the data access services that are available including: environment that is currently available to researchers;additional consultancy and services;any indication of costs associated. If no environment is currently available, indicate the current plans and timelines when and how data will be made available to researchers Note: This value will be used as default access environment for all data sets submitted by the organisation. However, there will be the opportunity to overwrite this value for each data set.

Example: https://cnfl.extge.co.uk/display/GERE/Research+Environment+User+Guide
allOf
string
Max length: 5000
Min length: 2
accessRequestCost

Provides link(s) to a webpage or a short description detailing the commercial model for processing data access requests for the organisation (if available) Definition: Indication of commercial model or cost (in GBP) for processing each data access request by the data custodian.

anyOf
string
Max length: 1000
Min length: 2
array
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
dataUseLimitation

Provides an indication of consent permissions for data sets and/or materials, and relates to the purposes for which data sets and/or material might be removed, stored or used. Notes: where there are existing data-sharing arrangements such as the HDR UK HUB data sharing agreement or the NIHR HIC data sharing agreement this should be indicated within access rights. This value will be used as terms for all data sets submitted by the organisation. However, there will be the opportunity to overwrite this value for each data set.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string
Allowed values: GENERAL RESEARCH USE, COMMERCIAL RESEARCH USE, GENETIC STUDIES ONLY, NO GENERAL METHODS RESEARCH, NO RESTRICTION, GEOGRAPHICAL RESTRICTIONS, INSTITUTION SPECIFIC RESTRICTIONS, NOT FOR PROFIT USE, PROJECT SPECIFIC RESTRICTIONS, RESEARCH SPECIFIC RESTRICTIONS, USER SPECIFIC RESTRICTION, RESEARCH USE ONLY, NO LINKAGE
dataUseRequirements

Indicates if there are any additional conditions set for use if any, multiple requirements may be provided. Ensure that these restrictions are documented in access rights information.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string
Allowed values: COLLABORATION REQUIRED, PROJECT SPECIFIC RESTRICTIONS, ETHICS APPROVAL REQUIRED, INSTITUTION SPECIFIC RESTRICTIONS, GEOGRAPHICAL RESTRICTIONS, PUBLICATION MORATORIUM, PUBLICATION REQUIRED, RETURN TO DATABASE OR RESOURCE, TIME LIMIT ON USE, DISCLOSURE CONTROL, NOT FOR PROFIT USE, USER SPECIFIC RESTRICTION
contactPoint
string
required

Provides a valid email address that can be used to coordinate data access requests with the publisher. Organisations are expected to provide a dedicated email address associated with the data access request process. Notes: An employee’s email address can only be provided on a temporary basis and if one is provided an explicit consent must be obtained for this purpose.

Default: Defaulted to the contact point of the primary organisation of the user however, can be overridden for specific data sets
Example: SAILDatabank@swansea.ac.uk
allOf
string email
keywords
required

Provides relevant and specific keywords that can improve the SEO of your data set as a comma separated list. Notes: Onboarding portal will suggest keywords based on title, abstract and description. We are compiling a standardised list of keywords and synonyms across data sets to make filtering easier for users.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string
Max length: 80
Min length: 2
alternateIdentifiers

Alternate data set identifiers or local identifiers.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string
Max length: 1000
Min length: 2
doiName

All HDR UK registered data sets should either have a Digital Object Identifier (DOI) or be working towards obtaining one.

Example: ["10.3399/bjgp17X692645"]
allOf
string
Pattern: ^10.\d{4,9}/[-._;()/:a-zA-Z0-9]+$
documentation
object

Documentation can include a rich text description of the data set or links to media such as documents, images, presentations, videos or links to data dictionaries, profiles or dashboards. Organisations are required to confirm that they have permission to distribute any additional media.

description

A free-text description of the record.

allOf
string
Max length: 3000
Min length: 2
associatedMedia

Provides any media associated with the Gateway Organisation using a valid URI for the content. This is an opportunity to provide additional context that could be useful for researchers wanting to understand more about the data set and its relevance to their research question. The following formats will be accepted .jpg, .png or .svg, .pdf, .xslx or .docx. Note: media asset can be hosted by the organisation or uploaded using the onboarding portal.

Example: PDF Document that describes study protocol
anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
isPartOf
string

Indicates if the data set is part of a group or family.

Default: NOT APPLICABLE
Example: Hospital Episodes Statistics data sets (A&E, APC, OP, AC MSDS).
anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
anyOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
string
Max length: 80
Min length: 2
string
Allowed values: NOT APPLICABLE
coverage
object

Coverage information includes attributes for geographical and temporal coverage, cohort details etc. to enable a deeper understanding of the data set content so that researchers can make decisions about the relevance of the underlying data.

spatial

The geographical area covered by the data set. It is recommended that links are to entries in a well-maintained gazetteer such as https://www.geonames.org/ or https://what3words.com/daring.lion.race.

Example: https://www.geonames.org/2635167/united-kingdom-of-great-britain-and-northern-ireland.html
anyOf
string
Pattern: ([^,]+)
array
anyOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
typicalAgeRange

Indicates the age range in whole years of participants in the data set. Range has the following format ‘[min age] – [max age]’ where both the minimum and maximum are whole numbers (integers).

allOf
string
Pattern: (150|1[0-4][0-9]|[0-9]|[1-8][0-9]|9[0-9])-(150|1[0-4][0-9]|[0-9]|[1-8][0-9]|9[0-9])
physicalSampleAvailability

Availability of physical samples associated with the data set. Indicates the types of samples that are avilable. More than one type may be provided. If sample are not yet available, use “AVAILABILITY TO BE CONFIRMED”. If samples are not available, use “NOT AVAILABLE”.

Example: BONE MARROW
anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
string
Allowed values: NOT AVAILABLE, BONE MARROW, CANCER CELL LINES, CORE BIOPSY, CDNA OR MRNA, DNA, FAECES, IMMORTALIZED CELL LINES, MICRORNA, PERIPHERAL BLOOD CELLS, PLASMA, PM TISSUE, PRIMARY CELLS, RNA, SALIVA, SERUM, SWABS, TISSUE, URINE, WHOLE BLOOD, AVAILABILITY TO BE CONFIRMED, OTHER
followup
string

If known, what is the typical time span that a patient appears in the data set (follow up period).

Default: UNKNOWN
allOf
string
Allowed values: 0 - 6 MONTHS, 6 - 12 MONTHS, 1 - 10 YEARS, > 10 YEARS, UNKNOWN, CONTINUOUS, OTHER
pathway

Indicates if the data set is representative of the patient pathway and any limitations the data set may have with respect to pathway coverage. This could include if the data set is from a single speciality or area, a single tier of care, linked across two tiers (e.g. primary and secondary care), or an integrated care record covering the whole patient pathway.

allOf
string
Max length: 3000
Min length: 2
provenance
object

Provenance information allows researchers to understand data within the context of its origins and can be an indicator of quality, authenticity and timeliness.

origin
allOf
object
purpose

Indicates the purpose(s) that the data set was collected.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string
Allowed values: STUDY, DISEASE REGISTRY, TRIAL, CARE, AUDIT, ADMINISTRATIVE, FINANCIAL, STATUTORY, OTHER
source

Indicates the source of the data extraction.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string
Allowed values: EPR, ELECTRONIC SURVEY, LIMS, OTHER INFORMATION SYSTEM, PAPER BASED, FREETEXT NLP, MACHINE GENERATED, OTHER
collectionSituation

Indicates the setting(s) where data was collected. Multiple settings may be provided.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
string
Allowed values: CLINIC, PRIMARY CARE, ACCIDENT AND EMERGENCY, OUTPATIENTS, IN-PATIENTS, SERVICES, COMMUNITY, HOME, PRIVATE, PHARMACY, SOCIAL CARE, LOCAL AUTHORITY, NATIONAL GOVERNMENT, OTHER
temporal
required
allOf
object
accrualPeriodicity
string
required

Indicates the frequency of distribution release. If a data set is distributed regularly then a distribution release periodicity from the constrained list should be used which indicates the next release date. When the release date becomes historical, a new release date will be calculated based on the publishing periodicity. If a data set has been published and will remain static indicate that it is static and indicate when it was released. If a data set is released on an irregular basis or “on-demand” indicate that it is Irregular and leave release date as null. If a data set can be published in real-time or near-real-time indicate that it is continuous and leave release date as null. Notes: see https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/.

Default:
allOf
string
Allowed values: STATIC, IRREGULAR, CONTINUOUS, BIENNIAL, ANNUAL, BIANNUAL, QUARTERLY, BIMONTHLY, MONTHLY, BIWEEKLY, WEEKLY, SEMIWEEKLY, DAILY, OTHER
distributionReleaseDate

Date of the latest release of the data set. If this is a regular release i.e. quarterly, or this is a static data set this should be completed alongside Periodicity. If this is Irregular or Continuously released leave this blank. Notes: Periodicity and release date will be used to determine when the next release is expected. E.g. if the release date is documented as 01/01/2020 and it is now 20/04/2020 and there is a quarterly release schedule, the latest release will be calculated as 01/04/2020.

anyOf
string date
string date-time
startDate
required

The start of the time period that the data set provides coverage for. If there are multiple cohorts in the data set with varying start dates, provide the earliest date and use the description or the media attribute to provide more information.

anyOf
string date
string date-time
endDate

The end of the time period that the data set provides coverage for. If the data set is “Continuous” and has no known end date, state continuous. If there are multiple cohorts in the data set with varying end dates, provide the latest date and use the description or the media attribute to provide more information.

anyOf
string date
string date-time
string
Allowed values: CONTINUOUS
timeLag
required

Indicates the typical time-lag between an event and the data for that event appearing in the data set.

allOf
string
Allowed values: LESS 1 WEEK, 1-2 WEEKS, 2-4 WEEKS, 1-2 MONTHS, 2-6 MONTHS, MORE 6 MONTHS, VARIABLE, NO TIMELAG, NOT APPLICABLE, OTHER
accessibility
object
required

Accessibility information allows researchers to understand access, usage, limitations, formats, standards and linkage or interoperability with toolsets.

usage
object

This section includes information about how the data can be used and how it is currently being used.

dataUseLimitation

Provides an indication of consent permissions for data sets and/or materials, and relates to the purposes for which data sets and/or material might be removed, stored or used. NOTE: we have extended the DUO to include a value for NO LINKAGE.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string
Allowed values: GENERAL RESEARCH USE, COMMERCIAL RESEARCH USE, GENETIC STUDIES ONLY, NO GENERAL METHODS RESEARCH, NO RESTRICTION, GEOGRAPHICAL RESTRICTIONS, INSTITUTION SPECIFIC RESTRICTIONS, NOT FOR PROFIT USE, PROJECT SPECIFIC RESTRICTIONS, RESEARCH SPECIFIC RESTRICTIONS, USER SPECIFIC RESTRICTION, RESEARCH USE ONLY, NO LINKAGE
dataUseRequirements

Indicates if there are any additional conditions set for use if any, multiple requirements may be provided. These restrictions are documented in access rights information.

anyOf
string
Pattern: ([^,]+)
array
unique items
Min items: 1
allOf
string
Allowed values: COLLABORATION REQUIRED, PROJECT SPECIFIC RESTRICTIONS, ETHICS APPROVAL REQUIRED, INSTITUTION SPECIFIC RESTRICTIONS, GEOGRAPHICAL RESTRICTIONS, PUBLICATION MORATORIUM, PUBLICATION REQUIRED, RETURN TO DATABASE OR RESOURCE, TIME LIMIT ON USE, DISCLOSURE CONTROL, NOT FOR PROFIT USE, USER SPECIFIC RESTRICTION
resourceCreator

Provides the text that you would like included as part of any citation that credits this data set. This is typically just the name of the publisher. No employee details should be provided.

anyOf
string
Max length: 1000
Min length: 2
array
string
Max length: 1000
Min length: 2
investigations

Investigations.

anyOf
string
Pattern: ([^,]+)
array
allOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
isReferencedBy

Provides the keystone paper associated with the data set. Also include a list of known citations, if available and should be links to existing resources where the data set has been used or referenced. Provides multiple entries, or if a csv upload provides them as a tab separated list.

anyOf
string
Pattern: ^10.\d{4,9}/[-._;()/:a-zA-Z0-9]+$
array
allOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
access
object
required

This section includes information about data access.

accessRights
required

Access rights.

anyOf
string
Pattern: ^In Progress$
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
array
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
accessService

Provides a brief description of the data access services that are available including: environment that is currently available to researchers;additional consultancy and services;any indication of costs associated. If no environment is currently available, indicates the current plans and timelines when and how data will be made available to researchers Note: This value will be used as default access environment for all data sets submitted by the organisation. However, there will be the opportunity to overwrite this value for each data set.

Example: https://cnfl.extge.co.uk/display/GERE/Research+Environment+User+Guide
allOf
string
Max length: 5000
Min length: 2
accessRequestCost

Provides link(s) to a webpage detailing the commercial model for processing data access requests for the organisation (if available) Definition: Indication of commercial model or cost (in GBP) for processing each data access request by the data custodian.

anyOf
string
Max length: 5000
Min length: 2
array
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
deliveryLeadTime

Provides an indication of the typical processing times based on the types of requests typically received.

allOf
string
Allowed values: LESS 1 WEEK, 1-2 WEEKS, 2-4 WEEKS, 1-2 MONTHS, 2-6 MONTHS, MORE 6 MONTHS, VARIABLE, NOT APPLICABLE, OTHER
jurisdiction
string
required

Country code from ISO 3166-1 country codes and the associated ISO 3166-2 for regions, cities, states etc. for the country/state under whose laws the data subjects’ data is collected, processed and stored.

Default: GB-ENG
anyOf
string
Pattern: ([^,]+)
array
allOf
string
Pattern: ^[A-Z]{2}(-[A-Z]{2,3})?$
dataController
required

Data Controller means a person/entity who (either alone or jointly or in common with other persons/entities) determines the purposes for which and the way any Data Subject data, specifically personal data or are to be processed.

allOf
string
Max length: 5000
Min length: 2
dataProcessor

A Data Processor, in relation to any Data Subject data, specifically personal data, means any person/entity (other than an employee of the data controller) who processes the data on behalf of the data controller.

allOf
string
Max length: 5000
Min length: 2
formatAndStandards
object

Section includes technical attributes for language vocabularies, sizes etc. and gives researchers facts about and processing the underlying data in the data set.

vocabularyEncodingScheme
string
required

Lists any relevant terminologies / ontologies / controlled vocabularies, such as ICD 10 Codes, NHS Data Dictionary National Codes or SNOMED CT International, that are being used by the data set. If the controlled vocabularies are local standards, make that explicit. If you are using a standard that has not been included in the list, use “other” and contact support desk to ask for an addition. Notes: More than one vocabulary may be provided.

Default: LOCAL
anyOf
string
Pattern: ([^,]+)
array
Min items: 0
allOf
string
Allowed values: LOCAL, OPCS4, READ, SNOMED CT, SNOMED RT, DM PLUS D, DM+D, NHS NATIONAL CODES, NHS SCOTLAND NATIONAL CODES, NHS WALES NATIONAL CODES, ODS, LOINC, ICD10, ICD10CM, ICD10PCS, ICD9CM, ICD9, ICDO3, AMT, APC, ATC, CIEL, HPO, CPT4, DPD, DRG, HEMONC, JMDC, KCD7, MULTUM, NAACCR, NDC, NDFRT, OXMIS, RXNORM, RXNORM EXTENSION, SPL, OTHER
conformsTo
string
required

Lists standardised data models that the data set has been stored in or transformed to, such as OMOP or FHIR. If the data is only available in a local format, make that explicit. If you are using a standard that has not been included in the list, use “other” and contact support desk to ask for an addition.

Default: LOCAL
anyOf
string
Pattern: ([^,]+)
array
allOf
string
Allowed values: HL7 FHIR, HL7 V2, HL7 CDA, HL7 CCOW, LOINC, DICOM, I2B2, IHE, OMOP, OPENEHR, SENTINEL, PCORNET, CDISC, NHS DATA DICTIONARY, NHS SCOTLAND DATA DICTIONARY, NHS WALES DATA DICTIONARY, LOCAL, OTHER
language
string
required

This should list all the languages in which the data set metadata and underlying data is made available.

Default: en
anyOf
string
Pattern: ([^,]+)
array
Min items: 1
allOf
string
Allowed values: aa, ab, ae, af, ak, am, an, ar, as, av, ay, az, ba, be, bg, bh, bi, bm, bn, bo, br, bs, ca, ce, ch, co, cr, cs, cu, cv, cy, da, de, dv, dz, ee, el, en, eo, es, et, eu, fa, ff, fi, fj, fo, fr, fy, ga, gd, gl, gn, gu, gv, ha, he, hi, ho, hr, ht, hu, hy, hz, ia, id, ie, ig, ii, ik, io, is, it, iu, ja, jv, ka, kg, ki, kj, kk, kl, km, kn, ko, kr, ks, ku, kv, kw, ky, la, lb, lg, li, ln, lo, lt, lu, lv, mg, mh, mi, mk, ml, mn, mr, ms, mt, my, na, nb, nd, ne, ng, nl, nn, false, nr, nv, ny, oc, oj, om, or, os, pa, pi, pl, ps, pt, qu, rm, rn, ro, ru, rw, sa, sc, sd, se, sg, si, sk, sl, sm, sn, so, sq, sr, ss, st, su, sv, sw, ta, te, tg, th, ti, tk, tl, tn, to, tr, ts, tt, tw, ty, ug, uk, ur, uz, ve, vi, vo, wa, wo, xh, yi, yo, za, zh, zu
format
required

If multiple formats are available specify. See application, audio, image, message, model, multipart, text, video, https://www.iana.org/assignments/media-types/media-types.xhtml Note: If your file format is not included in the current list of formats, indicate other. If you are using the HOP you will be directed to a service desk page where you can request your additional format. If not go to: https://metadata.atlassian.net/servicedesk/customer/portal/4 to request your format.

anyOf
string
Pattern: ([^,]+)
array
Min items: 1
allOf
string
Min length: 1
enrichmentAndLinkage
object

Enrichment and Linkage includes information about related data sets that may have previously been linked, as well as indicating if there is the opportunity to link to other data sets in the future. If a data set has been enriched and/or derivations, scores and existing tools are available this section allows providers to indicate this to researchers.

qualifiedRelation

If applicable, provides the DOI of other data sets that have previously been linked to this data set and their availability. If no DOI is available, provides the title of the data sets that can be linked, where possible using the same title of a data set previously onboarded to the HOP. Note: If all the data sets from Gateway organisation can be linked indicate “ALL” and the onboarding portal will automate linkage across the data sets submitted.

anyOf
string
Pattern: ([^,]+)
array
anyOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
string
Max length: 80
Min length: 2
derivation

Indicate if derived data sets or predefined extracts are available and the type of derivation available. Notes. Single or multiple dimensions can be provided as a derived extract alongside the data set.

anyOf
string
Pattern: ([^,]+)
array
allOf
string
Max length: 255
Min length: 5
tools

Provides the URL of any analysis tools or models that have been created for this data set and are available for further use. Multiple tools may be provided. Note: We encourage users to adopt a model along the lines of https://www.ga4gh.org/news/tool-registry-service-api-enabling-an-interoperable-library-of-genomics-analysis-tools/.

anyOf
string
Pattern: ([^,]+)
array
Min items: 0
allOf
string uri
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
observations
array

Multiple observations about the data set may be provided and users are expected to provide at least one observation (1..*). We will be supporting the schema.org observation model (https://schema.org/Observation) with default values. Users will be encouraged to provide their own statistical populations as the project progresses. Examples:

Statistical Population Population Size Measured Property Observation Date Population Description
Persons 32937 Count 2017 Events relating to period between April - Sept 2017
Events 14900000 Count 15/01/2021 Number of unique death registrations since 1993 in England and Wales
Findings 17,891 Count 2020-09-03 Cancer Germline - Number of genomes
allOf
object
observedNode
required

Supports statistical populations codes for an observation.

Example: PERSONS
allOf
string
Allowed values: PERSONS, EVENTS, FINDINGS
measuredValue
integer
required

Provides the population size associated with the population type the data set i.e. 1000 people in a study, or 87 images (MRI) of Knee Usage Note: Used with Statistical Population, which specifies the type of the population in the data set.

disambiguatingDescription

If SNOMED CT term does not provide sufficient detail, this field provides a description that disambiguates the population type.

allOf
string
Max length: 255
Min length: 5
observationDate
string
required

Provides the date that the observation was made. Some data sets may be continuously updated and the number of records will change regularly, so the observation date provides users with the date that the analysis or query was run to generate the particular observation. Multiple observations can be made i.e. an observation of cumulative COVID positive cases by specimen on the 1/1/2021 could be 2M. On the 8/1/2021 a new observation could be 2.1M. Users can add multiple observations.

Default: release date
anyOf
string date
string date-time
measuredProperty
string
required

Initially this will be defaulted to "COUNT".

Default: COUNT
allOf
string
Allowed values: COUNT, Count, count
HTTP status: 4XX

An error occurred as follows:

HTTP status Error code Description
401 ACCESS_DENIED Access token missing, invalid or expired, or calling application not configured for this operation.
404 RESOURCE_NOT_FOUND No dataset resources found.
404 INVALID_ENDPOINT_PATH Invalid endpoint path.
Body

Content type: application/fhir+json

Example

Schema

Name Description
object

Outcome of an operation that does not result in a resource or bundle being returned, for example an error or an async/batch submission. There are a number of possible error codes that can be returned along with a more detailed description in the display field.

resourceType
string

FHIR Resource Type.

Default: OperationOutcome
issue
array

List of issues that have occurred.

Min items: 1
object
severity
string
required

Severity of the error.

Allowed values: fatal, error, warning, information
code
string
required

FHIR error code.

Allowed values: invalid, structure, required, value, invariant, security, login, unknown, expired, forbidden, suppressed, processing, not-supported, duplicate, multiple-matches, not-found, deleted, too-long, code-invalid, extension, too-costly, business-rule, conflict, transient, lock-error, no-store, exception, timeout, incomplete, throttled, informational
details
object

Internal error code.

coding
array
object
system
string

URI of the coding system specification.

Example: https://fhir.nhs.uk/R4/CodeSystem/Spine-ErrorOrWarningCode
version
string

Version of the coding system in use.

Example: 1
code
string

Symbol in syntax defined by the system.

Example: INVALID_VALUE
display
string

Representation defined by the system.

Example: Provided value is invalid
diagnostics
string

Additional diagnostic information about the issue. This information is subject to change.

HTTP status: 5XX

A 5xx status code means the server has a problem. For more details on the most common 5xx status codes and their meanings see the HTTP status codes.