object
|
HDR UK Dataset Metadata Schema.
|
identifier
required
|
System data set identifier (logical identifier of the dataset resource assigned by the Central Metastore). This is not the Dataset persistent identifier that is common across all dataset resource revisions.
|
anyOf
|
|
string
|
Pattern: ^[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}$
Max length: 36
Min length: 36
Example: 226fb3f1-4471-400a-8c39-2b66d46a39b6
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
version
string
required
|
Data set metadata version.
Pattern: ^([0-9]+)\.([0-9]+)\.([0-9]+)$
Example: 1.1.0
|
revisions
array
required
|
Data set Revisions. Includes the Semantic Version of the data set and URL endpoint to obtain the version.
|
allOf
|
|
object
|
|
version
string
required
|
Pattern: ^([0-9]+)\.([0-9]+)\.([0-9]+)$
Example: 1.1.0
|
url
string
uri
required
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
issued
string
date-time
required
|
Data set Metadata Creation Date.
Example: 2020-08-05T14:35:59Z
|
modified
string
date-time
required
|
Data set Metadata Modification Date.
Example: 2021-01-28T14:15:46Z
|
summary
object
required
|
Summary metadata must be completed by Data Custodians onboarding. metadata into the Innovation Gateway MVP.
|
title
required
|
Title of the data set limited to 80 characters. It should provide a short description of the data set and be unique across the gateway. The title can be prefixed with an organisation name or identifier to differentiate it from other data sets within the Gateway. Good titles should avoid acronyms, summarise the content of the data set and if relevant, the region the data set covers.
Example: North West London COVID-19 Patient Level Situation Report
|
allOf
|
|
string
|
Max length: 80
Min length: 2
|
abstract
required
|
Data set Abstract provides a clear and brief descriptive signpost for researchers who are searching for data that may be relevant to their research. The abstract should allow the reader to determine the scope of the data collection and accurately summarise its content. The optimal length is one paragraph (limited to 255 characters) and effective abstracts should avoid long sentences and abbreviations where possible.
Example: CPRD Aurum contains primary care data contributed by General Practitioner (GP) practices using EMIS Web® including patient registration information and all care events that GPs have chosen to record as part of their usual medical practice.
|
allOf
|
|
string
|
Max length: 255
Min length: 5
|
publisher
required
|
Data set publisher is the organisation responsible for running or supporting the data access request process, as well as publishing and maintaining the metadata. In most this will be the same as the HDR UK Organisation (Hub or Alliance Member). However, in some cases this will be different i.e. Tissue Directory are an HDR UK Gateway organisation but coordinate activities across a number of data publishers i.e. Cambridge Blood and Stem Cell Biobank.
|
allOf
|
|
object
|
Organisation metadata describes an organisation for purposes of discovery and identification.
|
identifier
|
Provides a Grid.ac identifier (see https://www.grid.ac/institutes) for your organisation. If an organisation does not have a Grid.ac identifier use the “suggest and institute” function here: https://www.grid.ac/institutes#
|
allOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
name
required
|
Name of the organisation.
|
allOf
|
|
string
|
Max length: 80
Min length: 2
|
logo
|
Provides a logo associated with the Gateway Organisation using a valid URL. The following formats will be accepted .jpg, .png or .svg.
|
allOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
description
|
Provides a URL that describes the organisation.
|
allOf
|
|
string
|
Max length: 3000
Min length: 2
|
contactPoint
required
|
Organisation contact point(s).
|
anyOf
|
|
string
email
|
|
array
|
|
allOf
|
|
string
email
|
|
memberOf
|
Indicates if the organisation is an Alliance Member or a Hub.
|
allOf
|
|
string
|
Allowed values: HUB, ALLIANCE, OTHER, NCS
|
accessRights
|
The URL of a webpage where the data access request process and/or guidance is provided. Can support multiple access processes i.e. industry vs academic.
|
anyOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
array
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
deliveryLeadTime
|
Provides an indication of the typical processing times based on the types of requests typically received. Note: This value will be used as default access request duration for all data sets submitted by the organisation. However, there will be the opportunity to overwrite this value for each data set.
|
allOf
|
|
string
|
Allowed values: LESS 1 WEEK, 1-2 WEEKS, 2-4 WEEKS, 1-2 MONTHS, 2-6 MONTHS, MORE 6 MONTHS, VARIABLE, NOT APPLICABLE, OTHER
|
accessService
|
Provides a brief description of the data access services that are available including: environment that is currently available to researchers;additional consultancy and services;any indication of costs associated. If no environment is currently available, indicate the current plans and timelines when and how data will be made available to researchers Note: This value will be used as default access environment for all data sets submitted by the organisation. However, there will be the opportunity to overwrite this value for each data set.
Example: https://cnfl.extge.co.uk/display/GERE/Research+Environment+User+Guide
|
allOf
|
|
string
|
Max length: 5000
Min length: 2
|
accessRequestCost
|
Provides link(s) to a webpage or a short description detailing the commercial model for processing data access requests for the organisation (if available) Definition: Indication of commercial model or cost (in GBP) for processing each data access request by the data custodian.
|
anyOf
|
|
string
|
Max length: 1000
Min length: 2
|
array
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
dataUseLimitation
|
Provides an indication of consent permissions for data sets and/or materials, and relates to the purposes for which data sets and/or material might be removed, stored or used. Notes: where there are existing data-sharing arrangements such as the HDR UK HUB data sharing agreement or the NIHR HIC data sharing agreement this should be indicated within access rights. This value will be used as terms for all data sets submitted by the organisation. However, there will be the opportunity to overwrite this value for each data set.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
|
Allowed values: GENERAL RESEARCH USE, COMMERCIAL RESEARCH USE, GENETIC STUDIES ONLY, NO GENERAL METHODS RESEARCH, NO RESTRICTION, GEOGRAPHICAL RESTRICTIONS, INSTITUTION SPECIFIC RESTRICTIONS, NOT FOR PROFIT USE, PROJECT SPECIFIC RESTRICTIONS, RESEARCH SPECIFIC RESTRICTIONS, USER SPECIFIC RESTRICTION, RESEARCH USE ONLY, NO LINKAGE
|
dataUseRequirements
|
Indicates if there are any additional conditions set for use if any, multiple requirements may be provided. Ensure that these restrictions are documented in access rights information.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
|
Allowed values: COLLABORATION REQUIRED, PROJECT SPECIFIC RESTRICTIONS, ETHICS APPROVAL REQUIRED, INSTITUTION SPECIFIC RESTRICTIONS, GEOGRAPHICAL RESTRICTIONS, PUBLICATION MORATORIUM, PUBLICATION REQUIRED, RETURN TO DATABASE OR RESOURCE, TIME LIMIT ON USE, DISCLOSURE CONTROL, NOT FOR PROFIT USE, USER SPECIFIC RESTRICTION
|
contactPoint
string
required
|
Provides a valid email address that can be used to coordinate data access requests with the publisher. Organisations are expected to provide a dedicated email address associated with the data access request process. Notes: An employee’s email address can only be provided on a temporary basis and if one is provided an explicit consent must be obtained for this purpose.
Default: Defaulted to the contact point of the primary organisation of the user however, can be overridden for specific data sets
Example: SAILDatabank@swansea.ac.uk
|
allOf
|
|
string
email
|
|
keywords
required
|
Provides relevant and specific keywords that can improve the SEO of your data set as a comma separated list. Notes: Onboarding portal will suggest keywords based on title, abstract and description. We are compiling a standardised list of keywords and synonyms across data sets to make filtering easier for users.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
|
Max length: 80
Min length: 2
|
alternateIdentifiers
|
Alternate data set identifiers or local identifiers.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
|
Max length: 1000
Min length: 2
|
doiName
|
All HDR UK registered data sets should either have a Digital Object Identifier (DOI) or be working towards obtaining one.
Example: ["10.3399/bjgp17X692645"]
|
allOf
|
|
string
|
Pattern: ^10.\d{4,9}/[-._;()/:a-zA-Z0-9]+$
|
documentation
object
|
Documentation can include a rich text description of the data set or links to media such as documents, images, presentations, videos or links to data dictionaries, profiles or dashboards. Organisations are required to confirm that they have permission to distribute any additional media.
|
description
|
A free-text description of the record.
|
allOf
|
|
string
|
Max length: 3000
Min length: 2
|
associatedMedia
|
Provides any media associated with the Gateway Organisation using a valid URI for the content. This is an opportunity to provide additional context that could be useful for researchers wanting to understand more about the data set and its relevance to their research question. The following formats will be accepted .jpg, .png or .svg, .pdf, .xslx or .docx. Note: media asset can be hosted by the organisation or uploaded using the onboarding portal.
Example: PDF Document that describes study protocol
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
isPartOf
string
|
Indicates if the data set is part of a group or family.
Default: NOT APPLICABLE
Example: Hospital Episodes Statistics data sets (A&E, APC, OP, AC MSDS).
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
anyOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
string
|
Max length: 80
Min length: 2
|
string
|
Allowed values: NOT APPLICABLE
|
coverage
object
|
Coverage information includes attributes for geographical and temporal coverage, cohort details etc. to enable a deeper understanding of the data set content so that researchers can make decisions about the relevance of the underlying data.
|
spatial
|
The geographical area covered by the data set. It is recommended that links are to entries in a well-maintained gazetteer such as https://www.geonames.org/ or https://what3words.com/daring.lion.race.
Example: https://www.geonames.org/2635167/united-kingdom-of-great-britain-and-northern-ireland.html
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
|
anyOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
typicalAgeRange
|
Indicates the age range in whole years of participants in the data set. Range has the following format ‘[min age] – [max age]’ where both the minimum and maximum are whole numbers (integers).
|
allOf
|
|
string
|
Pattern: (150|1[0-4][0-9]|[0-9]|[1-8][0-9]|9[0-9])-(150|1[0-4][0-9]|[0-9]|[1-8][0-9]|9[0-9])
|
physicalSampleAvailability
|
Availability of physical samples associated with the data set. Indicates the types of samples that are avilable. More than one type may be provided. If sample are not yet available, use “AVAILABILITY TO BE CONFIRMED”. If samples are not available, use “NOT AVAILABLE”.
Example: BONE MARROW
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
string
|
Allowed values: NOT AVAILABLE, BONE MARROW, CANCER CELL LINES, CORE BIOPSY, CDNA OR MRNA, DNA, FAECES, IMMORTALIZED CELL LINES, MICRORNA, PERIPHERAL BLOOD CELLS, PLASMA, PM TISSUE, PRIMARY CELLS, RNA, SALIVA, SERUM, SWABS, TISSUE, URINE, WHOLE BLOOD, AVAILABILITY TO BE CONFIRMED, OTHER
|
followup
string
|
If known, what is the typical time span that a patient appears in the data set (follow up period).
Default: UNKNOWN
|
allOf
|
|
string
|
Allowed values: 0 - 6 MONTHS, 6 - 12 MONTHS, 1 - 10 YEARS, > 10 YEARS, UNKNOWN, CONTINUOUS, OTHER
|
pathway
|
Indicates if the data set is representative of the patient pathway and any limitations the data set may have with respect to pathway coverage. This could include if the data set is from a single speciality or area, a single tier of care, linked across two tiers (e.g. primary and secondary care), or an integrated care record covering the whole patient pathway.
|
allOf
|
|
string
|
Max length: 3000
Min length: 2
|
provenance
object
|
Provenance information allows researchers to understand data within the context of its origins and can be an indicator of quality, authenticity and timeliness.
|
origin
|
|
allOf
|
|
object
|
|
purpose
|
Indicates the purpose(s) that the data set was collected.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
|
Allowed values: STUDY, DISEASE REGISTRY, TRIAL, CARE, AUDIT, ADMINISTRATIVE, FINANCIAL, STATUTORY, OTHER
|
source
|
Indicates the source of the data extraction.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
|
Allowed values: EPR, ELECTRONIC SURVEY, LIMS, OTHER INFORMATION SYSTEM, PAPER BASED, FREETEXT NLP, MACHINE GENERATED, OTHER
|
collectionSituation
|
Indicates the setting(s) where data was collected. Multiple settings may be provided.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
string
|
Allowed values: CLINIC, PRIMARY CARE, ACCIDENT AND EMERGENCY, OUTPATIENTS, IN-PATIENTS, SERVICES, COMMUNITY, HOME, PRIVATE, PHARMACY, SOCIAL CARE, LOCAL AUTHORITY, NATIONAL GOVERNMENT, OTHER
|
temporal
required
|
|
allOf
|
|
object
|
|
accrualPeriodicity
string
required
|
Indicates the frequency of distribution release. If a data set is distributed regularly then a distribution release periodicity from the constrained list should be used which indicates the next release date. When the release date becomes historical, a new release date will be calculated based on the publishing periodicity. If a data set has been published and will remain static indicate that it is static and indicate when it was released. If a data set is released on an irregular basis or “on-demand” indicate that it is Irregular and leave release date as null. If a data set can be published in real-time or near-real-time indicate that it is continuous and leave release date as null. Notes: see https://www.dublincore.org/specifications/dublin-core/collection-description/frequency/.
Default:
|
allOf
|
|
string
|
Allowed values: STATIC, IRREGULAR, CONTINUOUS, BIENNIAL, ANNUAL, BIANNUAL, QUARTERLY, BIMONTHLY, MONTHLY, BIWEEKLY, WEEKLY, SEMIWEEKLY, DAILY, OTHER
|
distributionReleaseDate
|
Date of the latest release of the data set. If this is a regular release i.e. quarterly, or this is a static data set this should be completed alongside Periodicity. If this is Irregular or Continuously released leave this blank. Notes: Periodicity and release date will be used to determine when the next release is expected. E.g. if the release date is documented as 01/01/2020 and it is now 20/04/2020 and there is a quarterly release schedule, the latest release will be calculated as 01/04/2020.
|
anyOf
|
|
string
date
|
|
string
date-time
|
|
startDate
required
|
The start of the time period that the data set provides coverage for. If there are multiple cohorts in the data set with varying start dates, provide the earliest date and use the description or the media attribute to provide more information.
|
anyOf
|
|
string
date
|
|
string
date-time
|
|
endDate
|
The end of the time period that the data set provides coverage for. If the data set is “Continuous” and has no known end date, state continuous. If there are multiple cohorts in the data set with varying end dates, provide the latest date and use the description or the media attribute to provide more information.
|
anyOf
|
|
string
date
|
|
string
date-time
|
|
string
|
Allowed values: CONTINUOUS
|
timeLag
required
|
Indicates the typical time-lag between an event and the data for that event appearing in the data set.
|
allOf
|
|
string
|
Allowed values: LESS 1 WEEK, 1-2 WEEKS, 2-4 WEEKS, 1-2 MONTHS, 2-6 MONTHS, MORE 6 MONTHS, VARIABLE, NO TIMELAG, NOT APPLICABLE, OTHER
|
accessibility
object
required
|
Accessibility information allows researchers to understand access, usage, limitations, formats, standards and linkage or interoperability with toolsets.
|
usage
object
|
This section includes information about how the data can be used and how it is currently being used.
|
dataUseLimitation
|
Provides an indication of consent permissions for data sets and/or materials, and relates to the purposes for which data sets and/or material might be removed, stored or used. NOTE: we have extended the DUO to include a value for NO LINKAGE.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
|
Allowed values: GENERAL RESEARCH USE, COMMERCIAL RESEARCH USE, GENETIC STUDIES ONLY, NO GENERAL METHODS RESEARCH, NO RESTRICTION, GEOGRAPHICAL RESTRICTIONS, INSTITUTION SPECIFIC RESTRICTIONS, NOT FOR PROFIT USE, PROJECT SPECIFIC RESTRICTIONS, RESEARCH SPECIFIC RESTRICTIONS, USER SPECIFIC RESTRICTION, RESEARCH USE ONLY, NO LINKAGE
|
dataUseRequirements
|
Indicates if there are any additional conditions set for use if any, multiple requirements may be provided. These restrictions are documented in access rights information.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
unique items
|
Min items: 1
|
allOf
|
|
string
|
Allowed values: COLLABORATION REQUIRED, PROJECT SPECIFIC RESTRICTIONS, ETHICS APPROVAL REQUIRED, INSTITUTION SPECIFIC RESTRICTIONS, GEOGRAPHICAL RESTRICTIONS, PUBLICATION MORATORIUM, PUBLICATION REQUIRED, RETURN TO DATABASE OR RESOURCE, TIME LIMIT ON USE, DISCLOSURE CONTROL, NOT FOR PROFIT USE, USER SPECIFIC RESTRICTION
|
resourceCreator
|
Provides the text that you would like included as part of any citation that credits this data set. This is typically just the name of the publisher. No employee details should be provided.
|
anyOf
|
|
string
|
Max length: 1000
Min length: 2
|
array
|
|
string
|
Max length: 1000
Min length: 2
|
investigations
|
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
|
allOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
isReferencedBy
|
Provides the keystone paper associated with the data set. Also include a list of known citations, if available and should be links to existing resources where the data set has been used or referenced. Provides multiple entries, or if a csv upload provides them as a tab separated list.
|
anyOf
|
|
string
|
Pattern: ^10.\d{4,9}/[-._;()/:a-zA-Z0-9]+$
|
array
|
|
allOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
access
object
required
|
This section includes information about data access.
|
accessRights
required
|
|
anyOf
|
|
string
|
Pattern: ^In Progress$
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
array
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
accessService
|
Provides a brief description of the data access services that are available including: environment that is currently available to researchers;additional consultancy and services;any indication of costs associated. If no environment is currently available, indicates the current plans and timelines when and how data will be made available to researchers Note: This value will be used as default access environment for all data sets submitted by the organisation. However, there will be the opportunity to overwrite this value for each data set.
Example: https://cnfl.extge.co.uk/display/GERE/Research+Environment+User+Guide
|
allOf
|
|
string
|
Max length: 5000
Min length: 2
|
accessRequestCost
|
Provides link(s) to a webpage detailing the commercial model for processing data access requests for the organisation (if available) Definition: Indication of commercial model or cost (in GBP) for processing each data access request by the data custodian.
|
anyOf
|
|
string
|
Max length: 5000
Min length: 2
|
array
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
deliveryLeadTime
|
Provides an indication of the typical processing times based on the types of requests typically received.
|
allOf
|
|
string
|
Allowed values: LESS 1 WEEK, 1-2 WEEKS, 2-4 WEEKS, 1-2 MONTHS, 2-6 MONTHS, MORE 6 MONTHS, VARIABLE, NOT APPLICABLE, OTHER
|
jurisdiction
string
required
|
Country code from ISO 3166-1 country codes and the associated ISO 3166-2 for regions, cities, states etc. for the country/state under whose laws the data subjects’ data is collected, processed and stored.
Default: GB-ENG
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
|
allOf
|
|
string
|
Pattern: ^[A-Z]{2}(-[A-Z]{2,3})?$
|
dataController
required
|
Data Controller means a person/entity who (either alone or jointly or in common with other persons/entities) determines the purposes for which and the way any Data Subject data, specifically personal data or are to be processed.
|
allOf
|
|
string
|
Max length: 5000
Min length: 2
|
dataProcessor
|
A Data Processor, in relation to any Data Subject data, specifically personal data, means any person/entity (other than an employee of the data controller) who processes the data on behalf of the data controller.
|
allOf
|
|
string
|
Max length: 5000
Min length: 2
|
formatAndStandards
object
|
Section includes technical attributes for language vocabularies, sizes etc. and gives researchers facts about and processing the underlying data in the data set.
|
vocabularyEncodingScheme
string
required
|
Lists any relevant terminologies / ontologies / controlled vocabularies, such as ICD 10 Codes, NHS Data Dictionary National Codes or SNOMED CT International, that are being used by the data set. If the controlled vocabularies are local standards, make that explicit. If you are using a standard that has not been included in the list, use “other” and contact support desk to ask for an addition. Notes: More than one vocabulary may be provided.
Default: LOCAL
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
Min items: 0
|
allOf
|
|
string
|
Allowed values: LOCAL, OPCS4, READ, SNOMED CT, SNOMED RT, DM PLUS D, DM+D, NHS NATIONAL CODES, NHS SCOTLAND NATIONAL CODES, NHS WALES NATIONAL CODES, ODS, LOINC, ICD10, ICD10CM, ICD10PCS, ICD9CM, ICD9, ICDO3, AMT, APC, ATC, CIEL, HPO, CPT4, DPD, DRG, HEMONC, JMDC, KCD7, MULTUM, NAACCR, NDC, NDFRT, OXMIS, RXNORM, RXNORM EXTENSION, SPL, OTHER
|
conformsTo
string
required
|
Lists standardised data models that the data set has been stored in or transformed to, such as OMOP or FHIR. If the data is only available in a local format, make that explicit. If you are using a standard that has not been included in the list, use “other” and contact support desk to ask for an addition.
Default: LOCAL
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
|
allOf
|
|
string
|
Allowed values: HL7 FHIR, HL7 V2, HL7 CDA, HL7 CCOW, LOINC, DICOM, I2B2, IHE, OMOP, OPENEHR, SENTINEL, PCORNET, CDISC, NHS DATA DICTIONARY, NHS SCOTLAND DATA DICTIONARY, NHS WALES DATA DICTIONARY, LOCAL, OTHER
|
language
string
required
|
This should list all the languages in which the data set metadata and underlying data is made available.
Default: en
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
Min items: 1
|
allOf
|
|
string
|
Allowed values: aa, ab, ae, af, ak, am, an, ar, as, av, ay, az, ba, be, bg, bh, bi, bm, bn, bo, br, bs, ca, ce, ch, co, cr, cs, cu, cv, cy, da, de, dv, dz, ee, el, en, eo, es, et, eu, fa, ff, fi, fj, fo, fr, fy, ga, gd, gl, gn, gu, gv, ha, he, hi, ho, hr, ht, hu, hy, hz, ia, id, ie, ig, ii, ik, io, is, it, iu, ja, jv, ka, kg, ki, kj, kk, kl, km, kn, ko, kr, ks, ku, kv, kw, ky, la, lb, lg, li, ln, lo, lt, lu, lv, mg, mh, mi, mk, ml, mn, mr, ms, mt, my, na, nb, nd, ne, ng, nl, nn, false, nr, nv, ny, oc, oj, om, or, os, pa, pi, pl, ps, pt, qu, rm, rn, ro, ru, rw, sa, sc, sd, se, sg, si, sk, sl, sm, sn, so, sq, sr, ss, st, su, sv, sw, ta, te, tg, th, ti, tk, tl, tn, to, tr, ts, tt, tw, ty, ug, uk, ur, uz, ve, vi, vo, wa, wo, xh, yi, yo, za, zh, zu
|
format
required
|
If multiple formats are available specify. See application, audio, image, message, model, multipart, text, video, https://www.iana.org/assignments/media-types/media-types.xhtml Note: If your file format is not included in the current list of formats, indicate other. If you are using the HOP you will be directed to a service desk page where you can request your additional format. If not go to: https://metadata.atlassian.net/servicedesk/customer/portal/4 to request your format.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
Min items: 1
|
allOf
|
|
string
|
Min length: 1
|
enrichmentAndLinkage
object
|
Enrichment and Linkage includes information about related data sets that may have previously been linked, as well as indicating if there is the opportunity to link to other data sets in the future. If a data set has been enriched and/or derivations, scores and existing tools are available this section allows providers to indicate this to researchers.
|
qualifiedRelation
|
If applicable, provides the DOI of other data sets that have previously been linked to this data set and their availability. If no DOI is available, provides the title of the data sets that can be linked, where possible using the same title of a data set previously onboarded to the HOP. Note: If all the data sets from Gateway organisation can be linked indicate “ALL” and the onboarding portal will automate linkage across the data sets submitted.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
|
anyOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
string
|
Max length: 80
Min length: 2
|
derivation
|
Indicate if derived data sets or predefined extracts are available and the type of derivation available. Notes. Single or multiple dimensions can be provided as a derived extract alongside the data set.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
|
allOf
|
|
string
|
Max length: 255
Min length: 5
|
tools
|
Provides the URL of any analysis tools or models that have been created for this data set and are available for further use. Multiple tools may be provided. Note: We encourage users to adopt a model along the lines of https://www.ga4gh.org/news/tool-registry-service-api-enabling-an-interoperable-library-of-genomics-analysis-tools/.
|
anyOf
|
|
string
|
Pattern: ([^,]+)
|
array
|
Min items: 0
|
allOf
|
|
string
uri
|
Example: https://web.www.healthdatagateway.org/dataset/226fb3f1-4471-400a-8c39-2b66d46a39b6
|
observations
array
|
Multiple observations about the data set may be provided and users
are expected to provide at least one observation (1..*). We will be supporting
the schema.org observation model (https://schema.org/Observation) with default
values. Users will be encouraged to provide their own statistical populations as the project progresses. Examples:
Statistical Population |
Population Size |
Measured Property |
Observation Date |
Population Description |
Persons |
32937 |
Count |
2017 |
Events relating to period between April - Sept 2017 |
Events |
14900000 |
Count |
15/01/2021 |
Number of unique death registrations since 1993 in England and Wales |
Findings |
17,891 |
Count |
2020-09-03 |
Cancer Germline - Number of genomes |
|
allOf
|
|
object
|
|
observedNode
required
|
Supports statistical populations codes for an observation.
Example: PERSONS
|
allOf
|
|
string
|
Allowed values: PERSONS, EVENTS, FINDINGS
|
measuredValue
integer
required
|
Provides the population size associated with the population type the data set i.e. 1000 people in a study, or 87 images (MRI) of Knee Usage Note: Used with Statistical Population, which specifies the type of the population in the data set.
|
disambiguatingDescription
|
If SNOMED CT term does not provide sufficient detail, this field provides a description that disambiguates the population type.
|
allOf
|
|
string
|
Max length: 255
Min length: 5
|
observationDate
string
required
|
Provides the date that the observation was made. Some data sets may be continuously updated and the number of records will change regularly, so the observation date provides users with the date that the analysis or query was run to generate the particular observation. Multiple observations can be made i.e. an observation of cumulative COVID positive cases by specimen on the 1/1/2021 could be 2M. On the 8/1/2021 a new observation could be 2.1M. Users can add multiple observations.
Default: release date
|
anyOf
|
|
string
date
|
|
string
date-time
|
|
measuredProperty
string
required
|
Initially this will be defaulted to "COUNT".
Default: COUNT
|
allOf
|
|
string
|
Allowed values: COUNT, Count, count
|