Summary

This "usage guidelines of DCAT-AP for High-Value Datasets", in short DCAT-AP HVD, provides guidelines on how to use DCAT-AP taking into account the requirements imposed by the High-Value Dataset implementing regulation (HVD IR).

Introduction

Context

In light of the growing importance of data, the European Commission has adopted an implementing act focused on High-Value Datasets on 21 December 2022 [[HVD]]. The implementing regulation groups the datasets in a list of six High-Value Datasets thematic categories: geospatial, earth observation and environment, meteorological, statistics, companies and company ownership, and mobility. The Portal for European Data [[DEU]] has published an easy-to-read overview.

Scope

This document provides the guidelines on how to use DCAT-AP for a dataset that is subject to the requirements imposed by the High-Value Dataset implementing regulation (HVD IR). The document is called the "usage guidelines of DCAT-AP for High-Value Datasets", in short DCAT-AP HVD. It means that these guidelines are on top of the existing guidelines expressed by DCAT-AP.

To understand these guidelines, it is important to realise that the HVD IR applies to a subset of all the datasets that are collected by (Open) Data Portals in Europe. A single catalogue contains catalogued resources which are within and outside scope of the HVD IR.

This document supports the need for a common usage of DCAT-AP for catalogued resources within the scope of the High-Value Dataset implementing regulation. When conforming to these guidelines, a dataset within the scope of the regulation will have satisfied the minimum metadata requirements to be included in the mandatory reporting of the regulation. It, however, does not mean an automated compliance with the regulation because certain aspects are beyond DCAT-AP. For instance, the regulation imposes certain data aspects to be present, or the licencing requires to be more permissive than CC-BY 4.0. Such requirements cannot be verified by just inspecting the DCAT-AP description, but the DCAT-AP description will assist in the assessment.

DCAT-AP HVD supports the implementers of the regulation in their assessment of their state of play. When applying the guidelines to the metadata of their datasets, which are in scope of the regulation, the necessary attention will be raised to drive towards conformance. At the same time, this effort will create an immediate benefit for the European citizens and businesses. Any improvement of the metadata will immediately flow throughout the European network of (Open) Data Portals and thus increase the level of metadata quality.

Meeting minutes

Status

This application profile has the status SEMIC Recommendation published at 2024-10-25.

Information about the process and the decisions involved in the creation of this specification are consultable at the Changelog.

License

Copyright © 2024 European Union. All material in this repository is published under the license CC-BY 4.0, unless explicitly otherwise mentioned.

Conformance Statement

In order for applications to conform to DCAT-AP HVD, it MUST conform to DCAT-AP. In addition, the application must conform to the mentioned constraints and usage guidelines following similar conformance statements as specified in DCAT-AP.

Provider requirements

In order to conform to this Application Profile, an application that provides metadata MUST: The application of the controlled vocabularies as described in section [[[#controlled-vocs]]].

Receiver requirements

In order to conform to this Application Profile, an application that receives metadata MUST be able to: "Processing" means that receivers must accept incoming data and transparently provide these data to applications and services. It does neither imply nor prescribe what applications and services finally do with the data (parse, convert, store, make searchable, display to users, etc.).

Terminology

An Application Profile is a specification that reuses terms from one or more base standards, adding more specificity by identifying mandatory, recommended and optional elements to be used for a particular application, as well as recommendations for controlled vocabularies to be used.

An Annex to an Application Profile is a specification that precises the use of some aspects of the Application Profile for a specific context.

Despite DCAT-AP HVD being called an Annex to DCAT-AP it has all the properties of an application profile of DCAT-AP. The notion Annex is intended to reflect the close proximity of this document to DCAT-AP. In case there are changes to DCAT-AP, this document will be updated too. Nevertheless, to keep the impact on the document to the minimum and maximize the readibility in relationship to the HVD IR, only the classes and properties that are featured explicitly in the HVD IR are listed in this document.

This specification uses the following prefixes to shorten the URIs for readibility.
PrefixNamespace IRI
admshttp://www.w3.org/ns/adms#
dcathttp://www.w3.org/ns/dcat#
dcataphttp://data.europa.eu/r5r/
dcthttp://purl.org/dc/terms/
dcthttp://data.europa.eu/eli/ontology#
foafhttp://xmlns.com/foaf/0.1/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfshttp://www.w3.org/2000/01/rdf-schema#
skoshttp://www.w3.org/2004/02/skos/core#
vcardhttp://www.w3.org/2006/vcard/ns#
xsdhttp://www.w3.org/2001/XMLSchema#

Overview

DCAT-AP HVD is an annex to DCAT-AP. It describes additional usage of the DCAT-AP to satisfy the High-Value Dataset implementing regulation (HVD IR). In this document only the additional information, that is required for the catalogued resources which are within scope of the regulation, is included. In any other case, the guidelines of DCAT-AP itself are applicable. To make this more visible for the reader, cross-references have been included to DCAT-AP.

As the publication URL indicates this version of DCAT-AP HVD is based on DCAT-AP 3.0.0.

Application profile diagram

An overview of DCAT-AP HVD is shown by the UML diagram below. The UML diagram illustrates the specification described in this document. For readability purposes the representation has been condensed as follows:

The cardinalities and qualifications are included in the figure.

For readibility of this document as an annex to DCAT-AP, the core relationships between classes are included.

This document describes the usage of the following main entities for a correct usage of the Application Profile:
| Catalogue | Catalogue Record | Catalogued Resource | Data Service | Dataset | Dataset Series | Distribution | Kind | Licence Document |

The main entities are supported by:
| Concept | Document | Legal Resource | Literal | Resource | Rights statement | Standard |

Main Entities

The main entities are those that form the core of the Application Profile. The properties and their associated constraints that apply in the context of this profile are listed in a tabular form. Each row corresponds to one property. In addition to the constraints also cross-references are provided to DCAT and DCAT-AP. For the last, to save space, the following abbreviations are used:

This reuse qualification assessement is w.r.t. a specific version of DCAT-AP. Therefore it may vary over time when new versions of DCAT-AP are created.

Catalogue

Definition
A catalogue or repository that hosts the Datasets or Data Services being described.
Reference in DCAT-AP
Link
Properties
For this entity the following properties are defined: dataset , record , service .
Property Range Card Definition Usage Reuse
dataset Dataset 0..* A Dataset that is part of the Catalogue. As empty Catalogues are usually indications of problems, this property should be combined with the next property service to implement an empty Catalogue check. A
record Catalogue Record 0..* A Catalogue Record that is part of the Catalogue A
service Data Service 0..* A site or end-point (Data Service) that is listed in the Catalogue. As empty Catalogues are usually indications of problems, this property should be combined with the previous property dataset to implement an empty Catalogue check. A

Catalogue Record

Definition
A description of a Dataset's entry in the Catalogue.
Reference in DCAT-AP
Link
Properties
For this entity the following properties are defined: primary topic .
Property Range Card Definition Usage Reuse
primary topic Catalogued Resource 1 A link to the Dataset, Data service or Catalog described in the record. A catalogue record will refer to one entity in a catalogue. This can be either a Dataset or a Data Service. To ensure an unambigous reading of the cardinality the range is set to Catalogued Resource. However it is not the intend with this range to require the explicit use of the class Catalogued Record. As abstract class, an subclass should be used. A

Catalogued Resource

Definition
Resource published or curated by a single agent.
Reference in DCAT-AP
Link
Usage Note
For DCAT-AP, the class is considered an abstract notion.
Properties
This specification does not impose any additional requirements to properties for this entity.

Data Service

Definition
A collection of operations that provides access to one or more datasets or data processing functions.
Reference in DCAT-AP
Link
Subclass of
Catalogued Resource
Properties
For this entity the following properties are defined: applicable legislation , contact point , documentation , endpoint description , endpoint URL , HVD category , licence , rights , serves dataset .
Property Range Card Definition Usage Reuse
applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Data Service. For HVD the value MUST include the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj.
As multiple legislations may apply to the resource the maximum cardinality is not limited.
E
contact point Kind 1..* Contact information that can be used for sending comments about the Data Service. Article 3.4 requires the designation of a point of contact for an API. E
documentation Document 1..* A page that provides additional information about the Data Service. Quality of service covers a broad spectrum of aspects. The HVD regulation does not list any mandatory topic. Therefore quality of service information is considered part of the generic documentation of a Data Service. E
endpoint description Resource 0..* A description of the services available via the end-points, including their operations, parameters etc. The property gives specific details of the actual endpoint instances, while dct:conformsTo is used to indicate the general standard or specification that the endpoints implement.
Article 3.3 requires to provide API documentation in a Union or internationally recognised open, human-readable and machine-readable format.
E
endpoint URL Resource 1..* The root location or primary endpoint of the service (an IRI). The endpoint URL SHOULD be persistent. This means that publishers should do everything in their power to maintain the value stable and existing. E
HVD category Concept 1..* The HVD category to which this Data Service belongs. P
licence Licence Document 0..1 A licence under which the Data service is made available. Article 3.3 specifies that the terms of use should be provided. According to the guidelines for legal Information in DCAT-AP HVD this is fulfilled by providing by preference a licence. As alternative rights can be used. E
rights Rights statement 0..* A statement that specifies rights associated with the Distribution. Article 3.3 specifies that the terms of use should be provided. According to the guidelines for legal Information in DCAT-AP HVD this is fulfilled by providing by preference a licence. As alternative rights can be used. P
serves dataset Dataset 0..* This property refers to a collection of data that this data service can distribute. An API in the context of HVD is not a standalone resource. It is used to open up HVD datasets. Therefore each Data Service is at least tightly connected with a Dataset.
A Dataset MUST have an associated Data Service, besides the exceptions listed in the HVD IR. This can be either specified directly, using this property, or indirectly. More information on this obligation see [[[#apis-are-mandatory]]].
E

Dataset

Definition
A conceptual entity that represents the information published.
Reference in DCAT-AP
Link
Subclass of
Catalogued Resource
Properties
For this entity the following properties are defined: applicable legislation , conforms to , contact point , dataset distribution , HVD Category , in series .
Property Range Card Definition Usage Reuse
applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Dataset. For HVD the value must include the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj.
As multiple legislations may apply to the resource the maximum cardinality is not limited.
E
conforms to Standard 0..* An implementing rule or other specification. The provided information should enable to the verification whether the detailed information requirements by the HVD is satisfied. For more usage suggestions see section on specific data requirements. A
contact point Kind 0..* Contact information that can be used for sending comments about the Dataset. A
dataset distribution Distribution 0..* An available Distribution for the Dataset. The HVD IR is a quality improvement of existing datasets. The intention is that HVD datasets are publicly and open accessible. Therefore a Distribution is expected to be present. (Article 3.1) This section provides more information how this strong recommendation is implemented. E
HVD Category Concept 1..* The HVD category to which this Dataset belongs. P
in series Dataset Series 0..* A dataset series of which the dataset is part. E

Dataset Series

Definition
A collection of datasets that are published separately, but share some characteristics that group them.
Usage Note
The class has been extended minimally to enable the identification of Dataset Series in scope of the HVD IR. The motivation and usage guidelines are found in section [[[#guidelines-datasetseries]]].
Subclass of
Catalogued Resource
Properties
For this entity the following properties are defined: applicable legislation , HVD Category .
Property Range Card Definition Usage Reuse
applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Dataset Series. For HVD the value must include the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj.
As multiple legislations may apply to the resource the maximum cardinality is not limited.
E
HVD Category Concept 1..* The HVD category to which this Dataset belongs. P

Distribution

Definition
A physical embodiment of the Dataset in a particular format.
Reference in DCAT-AP
Link
Usage Note
Bulk downloads should be encoded as a Distribution.
Properties
For this entity the following properties are defined: access service , access URL , applicable legislation , licence , linked schemas , rights .
Property Range Card Definition Usage Reuse
access service Data Service 0..* A data service that gives access to the distribution of the dataset An API in the context of HVD is not a standalone resource. It is used to open up HVD datasets. Therefore each Data Service is at least tightly connected with a Dataset.
A Dataset MUST have an associated Data Service, besides the exceptions listed in the HVD IR. This can be either specified directly or indirectly using this property. More information on this obligation see [[[#apis-are-mandatory]]].
E
access URL Resource 1..* A URL that gives access to a Distribution of the Dataset. The resource at the access URL contains information about how to get the Dataset. In accordance to the DCAT guidelines it is preferred to also set the downloadURL property if the URL is a reference to a downloadable resource. A
applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Distribution For HVD the value must include the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj.
As multiple legislations may apply to the resource the maximum cardinality is not limited.
E
licence Licence Document 0..1 A licence under which the Distribution is made available. Article 4.3 specifies that High-value datasets should be made available for reuse. According to the guidelines for legal Information in DCAT-AP HVD this is fulfilled by providing by preference a licence. As alternative rights can be used. E
linked schemas Standard 0..* An established schema to which the described Distribution conforms. The provided information should enable to the verification whether the detailed information requirements by the HVD is satisfied. For more usage suggestions see section on specific data requirements. A
rights Rights statement 0..* A statement that specifies rights associated with the Distribution. Article 4.3 specifies that High-value datasets should be made available for reuse. According to the guidelines for legal Information in DCAT-AP HVD this is fulfilled by providing by preference a licence. As alternative rights can be used. E

Kind

Definition
A description following the vCard specification, e.g. to provide telephone number and e-mail address for a contact point.
Reference in DCAT-AP
Link
Usage Note
Article3.4 requires to designate a point of contact for an API. It is recommended to provide at least either an email or a contact form from e.g. a service desk.
Properties
For this entity the following properties are defined: contact page , email .
Property Range Card Definition Usage Reuse
contact page Resource 0..1 A webpage that either allows to make contact (i.e. a webform) or the information contains how to get into contact.  P
email Resource 0..1 A email address via which contact can be made. P

Licence Document

Definition
A legal document giving official permission to do something with a resource.
Reference in DCAT-AP
Link
Usage Note
The HVD regulation requires a machine readable representation of a Licence. The minimal data model to describe a licence Document is beyond this specification. Nevertheless in [[[#c3]]] some suggestions are made.
Properties
This specification does not impose any additional requirements to properties for this entity.

Supportive Entities

The supportive entities are supporting the main entities in the Application Profile. They are included in the Application Profile because they form the range of properties.

Concept

Definition
An idea or notion; a unit of thought.
Reference in DCAT-AP
Link
Usage Note
In DCAT-AP, a Concept is used to denote codes within a codelist. In section [[[#controlled-vocs]]] the expectations are elaborated in more detail.
Properties
This specification does not impose any additional requirements to properties for this entity.

Document

Definition
A textual resource intended for human consumption that contains information, e.g. a web page about a Dataset.
Reference in DCAT-AP
Link
Properties
This specification does not impose any additional requirements to properties for this entity.

Legal Resource

Definition
This class represents the legislation, policy or policies that lie behind the Rules that govern the service.
Usage Note
The definition and properties of the Legal Resource class are aligned with the ontology included in "Council conclusions inviting the introduction of the European Legislation Identifier (ELI)". For describing the attributes of a Legal Resource (labels, preferred labels, alternative labels, definition, etc.) we refer to the ELI ontology. In this data specification the use is restricted to instances of this class that follow the ELI URI guidelines.
Properties
This specification does not impose any additional requirements to properties for this entity.

Literal

Definition
A literal value such as a string or integer; Literals may be typed, e.g. as a date according to xsd:date. Literals that contain human-readable text have an optional language tag as defined by BCP 4715 [[rfc5646]].
Reference in DCAT-AP
Link
Properties
This specification does not impose any additional requirements to properties for this entity.

Resource

Definition
Anything described by RDF.
Reference in DCAT-AP
Link
Properties
This specification does not impose any additional requirements to properties for this entity.

Rights statement

Definition
A statement about the intellectual property rights (IPR) held in or over a resource, a legal document giving official permission to do something with a resource, or a statement about access rights.
Reference in DCAT-AP
Link
Properties
This specification does not impose any additional requirements to properties for this entity.

Standard

Definition
A standard or other specification to which a Dataset or Distribution conforms.
Reference in DCAT-AP
Link
Properties
This specification does not impose any additional requirements to properties for this entity.

Controlled Vocabularies

The usage of controlled vocabularies in DCAT-AP HVD conforms and extends the usage defined by DCAT-AP. In addition, the following controlled vocabularies MUST be used for the properties listed in the table below. The MUST be used interpretation means that the range value space of the property is closed under the controlled vocabulary. Validation systems SHOULD produce errors.
Property URIUsed for ClassVocabulary nameVocabulary URIUsage note
dcatap:hvdCategoryDatasetEU Vocabularies HVD Categorieshttp://data.europa.eu/bna/asd487ae75
dcatap:hvdCategoryData ServiceEU Vocabularies HVD Categorieshttp://data.europa.eu/bna/asd487ae75
dcatap:hvdCategoryDataset Series EU Vocabularies HVD Categorieshttp://data.europa.eu/bna/asd487ae75

Licence controlled vocabularies

The HVD IR imposes quality requirements on the published legal conditions. In line with the generic DCAT-AP guidelines for publishing controlled vocabularies, a licence controlled vocabulary SHOULD:

Mapping the HVD IR to DCAT-AP

This section provides recommendations how to encode descriptions required by the HVD implementation regulation (HVD IR) as a DCAT-AP metadata structure. Each topic is introduced first from the perspective of the HVD IR, followed by an assessment of the topic on the use of DCAT-AP. The selected interpretation is further elaborated, where appropriate, with implementation guidelines.

Alignment of terminology

The HVD implementation regulation uses the terms Dataset, Bulk Download and API.

In the context of DCAT-AP, a HVD Dataset is mapped on a Dataset, Bulk Download on a Distribution and API on a Data Service. To be conformant with the use of DCAT-AP in the context of the HVD IR, this mapping MUST be followed.

To make the text easier to read, with a HVD Dataset we mean a Dataset in scope of the HVD implementing regulation. The same pattern is applied to other entities.

Dataset Series

The HVD IR does not mention any special generic treatment for Dataset Series. The sole mentioning happens in the context of Statistical data: time series should provide data from at least the moment that the HVD IR went into force.

As the HVD IR relies on strengthening the existing data sharing practices in the different domains, DCAT-AP HVD follows the same approach. It means that in case that according to the domain a Dataset Series would be considered a HVD Dataset it should be identified in the same approach as for HVD Datasets. As a Dataset Series is often viewed as a Dataset also the HVD categorisation is imposed. More on this identification is found in the next section.

In this annex, no further advises or guidelines are provided to determine whether a dataset should be considered or represented as a Dataset Series. That is not provided by the HVD IR and therefore these advises or guidelines are to be discussed within the context of the generic DCAT-AP.

In scope of HVD IR

A Dataset is a HVD Dataset if and only if a MS has included it in its reporting. The HVD IR defines High-Value Datasets. It may be possible that the same definition applies to multiple entities. In that case, a Member State should select the most appropriate one, according to the rules in the regulation. If the Member State decides to include multiple entities in the reporting, the requirements set out in the HVD IR will apply to all these entities. Also, if a Member State decides to include a dataset in the HVD reporting for which inclusion is not mandatory, then the requirements of the HVD IR will apply. The report is an engagement of the Member State to the European data community to sustain those datasets.

If a re-user discovers a dataset that seems to be in scope of the HVD IR, then the responsible MS should be able to provide an explanation why it is not included in the reporting. One response to this question could be by providing the relevant HVD Dataset corresponding to that dataset.

It is important to note that the identification of a dataset to be a HVD Dataset is an action that has to be approved by its Member State representative responsible for the implementation of the HVD IR. The official reporting and the metadata shared to the public SHOULD coincide. Member States therefore must coordinate their activities to avoid unintended deviations. Consequently, the application of DCAT-AP HVD is only necessary if the intention is to comply with the HVD IR and DCAT-AP metadata is being exchanged. As stated in the introduction, the use of DCAT-AP HVD is not obliged by the HVD IR, but in the case of exchanging DCAT-AP metadata of HVD Datasets this specification should be followed.

Denoting a HVD Dataset

Each entity (Dataset, Data Service, Distribution, Catalogue) that is identified by a MS in scope of the HVD IR should provide the European Legislation Identifier (ELI) http://data.europa.eu/eli/reg_impl/2023/138/oj of the HVD IR for the property applicable legislation. For the reporting, a Member State can provide a catalogue containing all elements that are within scope for the reporting of the HVD IR. In that case the catalogue should also set the value for the property applicable legislation to the ELI of the HVD.

Special cases

When a Dataset is within scope of HVD, it is not mandatory that all distributions are within scope of HVD. Existing metadata remains valid. Our recommendations ensure that existing metadata (specified in DCAT-AP or other frameworks like INSPIRE) remains valid. Becoming a Dataset in scope of HVD is an additional operation.

When a Data Service offers access to multiple datasets and this Data Service fulfils the HVD requirements (e.g. the HVD API for that dataset) for a HVD then the HVD requirements apply only to that HVD. It is common that the same API service endpoint (denoted by a dcat:DataService) provides access to multiple datasets. As such, it is to be expected that only some of the datasets are within scope of HVD. Like for Distributions, the HVD does not enforce that all Datasets associated with a Data Service must be in scope of HVD. Nevertheless, it must be noted that the HVD requirements on a Data Service might indirectly impact the other datasets that are available through the same data service, because a Data Service will share the operational and service level requirements for all its associated datasets.

HVD data category

The HVD IR defines six thematic data categories: geospatial, earth observation and environment, meteorological, statistics, companies and company ownership, and mobility. A new property HVD category is introduced to indicate the HVD category to which an resource, i.e. a dataset, belongs. The controlled vocabulary High-value dataset categories with all possible values is maintained by the Publications Office. A resource may belong to more than one data category. It is recommended to annotate the High Value Datasets with the most precise concept.

Identifiers

In general, the requirements of the HVD IR are satisfied when the best practices of DCAT-AP on identifiers are followed. According to HVD IR the identifiers provided in the report should be an online reference to the metadata.

In short these are:

In practice, multiple identifiers may have been assigned to a Dataset. It is recommended to select a master identifier and use this one to implement the HVD IR. In general, harvesters and portals are advised to use and promote this master identifier as the identifier for the HVD Dataset. In addition it is recommended to augment the list of other identifiers with the encountered identifiers. These identifier processing recommendations are made to ensure that the information in usages like the HVD reporting (i.e. a reference to a dataset) is with the published metadata on data portals.

Persistent identifiers

The HVD IR requires as part of the reporting requirements (article 5.3), that Licensing Conditions and APIs have persistent links.

Persistence means that, for these entities, Member States take the responsibility to maintain the real world resource indefinitely and additionally reduce the accessibility challenge by maintaining the same name for that real world resource indefinitely. Thus for the entities that MSs include in the reporting and for which the reporting requires a persistent link, a MS makes a persistent commitment.

As DCAT-AP is a Semantic Web data specification, persistence is associated with the use persistent URIs (PURIs) for the metadata descriptions. A general advice for DCAT-AP implementers is to use PURIs for all entities, but mostly for Datasets and Data Services. The practice, though, shows that this is not universally applied. To reduce this gap between the intention and the practice DCAT-AP has proposed a number of guidelines on identifiers [[IdentifierGuidelines]]. Implementers of the HVD IR are advised to read these guidelines to understand how identity might or might not be preserved from one data portal to another, and take the appropriate actions.

In article 5.3 of the HVD IR, the broad term Licensing Conditions is used, while in the other parts of the regulation the term Licence is used. DCAT-AP provides several means to express legal information, notably the properties licence (dct:license) and rights (dct:rights). This may lead to questions whether rights are included by the reporting requirement. As the final objective is to provide a trusted legal statement, it is considered that the requirement for a persistent link applies to rights too.

The reporting requirement for a persistent link for APIs is ambiguous from the perspective of DCAT-AP. In DCAT-AP, there is the identifier for the Data Service, i.e. the description about the API, and the property endpoint URL, which is the technical endpoint via which the data exchange will happen. The impact that the persistency requirement has on each is different and requires special attention from the HVD Dataset publisher. As the HVD IR does not specify precisely which case it covers, both are considered in scope. That means that a Data Service has a PURI and that its endpoint URL is persistent.

DCAT-AP does not impose persistent identification of an endpoint URL. It, however, expects a life-cycle management of the API through metadata. For that, DCAT-AP recommends to follow the DCAT guidelines on Resource life-cycle.

For example, consider an API which is at the end of its lifecycle. According to DCAT(-AP), the PURI of the Data Service could get the status ‘deprecated’ and the endpoint URL could be made void when it is taken offline. Any data portal user would understand that this Data Service should not to used anymore. If the metadata is augmented with the information about the successor of the Data Service, the data portal owner can be guided to the new Data Service.

The impact for a user of the endpoint URL is higher: systems might get broken when the endpoint URL is taken offline. This situation is the result of a shared responsibility: either the publishers did not apply a decent life cycle management, or the users did not inform the publisher about their critical dependency. Because of this, even if the API gives open access to open data, users that are dependent on the API are advised to inform the publisher about their existence. But also, publishers must improve their life cycle management for these APIs so that re-users get the right information and can take the change of the endpoint of the API into account in their roadmaps.

The enforcement of a persistent link for the endpoint URL will reduce the occurrence of such cases, but it will not make them disappear. And this enforcement imposes additional care on the Data Services (APIs) by HVD publishers. When an API is moved to a new platform (e.g. from a local API gateway to an organisation-wide one), the original endpoint URL must be maintained, and also the metadata management must be maintained.

In summary, the recommendation is to have persistency for both aspects of the API: its metadata identifier as its endpoint URL.

Legal information

The HVD IR requires a high level of metadata quality for legal information. The information should be provided in machine and human readable format, using a persistent link. Furthermore, it should be possible to investigate whether the legal conditions are equal or more permissive than the reference CC-BY 4.0.

Despite these strong requirements, the HVD IR does not alter the recommendations and practice of expressing legal information in DCAT-AP. The HVD requirements do extend or precise how the legal information technically should be provided. In the DCAT-AP legal information corresponds to licences and as well to rights expressions. In currently allowed practice, licence information may thus be supplied by a collection of rights statements, in cases that national legislation does not allow to provide a licence document. This is compatible with the HVD IR, and in that case, the HVD requirements will also apply to the rights statements. HVD IR also does not force to adapt the current DCAT-AP principle to indicate the legal information at the most precise level in the metadata description: i.e., Data Service and Distribution, therefore this principle is maintained. The latter has also the consequence that a Distribution or Data Service must be supplied for sharing the legal information.

Catalogue owners are advised to assess the legal information provided by the publishers according to flows in the figures below. For instance, if a publisher provides licence information referring to a licence document made online accessible by the publisher itself, then the publisher of that information must implement the HVD quality requirements for licence documents. The decision trees in the figures allow to assess whether or not additional effort has to be performed.

The decision tree for licence information.
The continuation of the decision tree focussing on rights information.

In the reporting requirements of the HVD IR, the notion terms of use is used. It has been agreed, by the Working Group for DCAT-AP HVD, that providing terms of use information is the same as providing legal information for a Data Service.

Assessment support for licences

To support the assessment whether the assigned legal conditions are equal or more permissive than the reference CC-BY 4.0, the recommendation is to augment the machine-readable publication of MS-specific or publisher-specific licences with mapping information on the EU Vocabularies Licence NAL [[NAL-Licence]]. It is recommended to express this mapping in the first place using properties that express equivalence. In order of preference The use of properties that express a weak(er) equality, e.g.skos:closeMatch or rdfs:seeAlso is not recommended; they should only be used as last resort. If non of the first two approaches could be used, then sharing matching information using these weaker properties may still assist the assessment. But the decision on compliance with the HVD IR is then left to the interpretation by the European Commission.

Using the SKOS matching properties requires to agree on a direction to indicate "more permissive than". In accordance to the hierarchy relationship of SKOS in which the most generic concept (the concept higher in the hierarchy, i.e. closer to the top-concept) is more general than the leaf concept the recommendation is to have the most permissive licences at the top and the most restrictive at the bottom.

From the definition of the SKOS relations and the definition of the corresponding SKOS mapping relations, a MS specific licence will be expressed as follows as more permissive than CC-BY-4.0.

Because skos:closeMatch is an indication that one concept is marginally more general or more precise than the other, it cannot be used to indicate the level op permissiveness. As described above skos:exactMatch should be used preferably.

Contact Point

The HVD IR request a contact point for APIs.

This requirement is implemented as the following recommendation. A contact point is mandatory for HVD Data Services and recommended for HVD Datasets either in the form of a (persistent) email address or a link to a contact form on a webpage, e.g. to contact a service desk.

Specific data requirements

The HVD implementation regulation describes, in its Annex, precisely the data elements that should be provided for a HVD Dataset. A HVD Dataset must conform to the rules defined in the HVD IR.

It is recommended to provide a reference to a public document (for instance: data standards) that describes the internals of the Dataset (or Distribution) using the property conforms to. This ensures that the information is made publicly accessible for re-users. It can be used by experts to verify if the Dataset matches the HVD requirements.

An alternative approach is the use of a self declaration of conformance. In this case the publisher of the HVD Dataset declares itself that it conforms to all data technical details the HVD IR imposes. The INSPIRE community has used this approach before. As the assessment of the validity of such self-declaration is a domain specific activity requiring expert knowledge, DCAT-AP HVD does not proposes a general self declaration statement. But leaves the design of such statement to the data experts of the HVD categories to provide an trustable approach.

Connecting Data Services with Datasets

The HVD IR imposes in general the presence of an API (Data Service) for a HVD Dataset. In DCAT(-AP) there are two routes to relate Data Services with Datasets: As such DCAT-AP HVD has not the intend to change the existing metadata description practices, nevertheless the following observations have to be made.

In case of a HVD reporting the indirect approach will be transformed into the direct approach. This is a natural consequence from the HVD IR wording: the dataset shall be available for reuse through APIs. Technically it corresponds to the following closure: the APIs for a HVD Dataset are the Data Servies that are directly or indirectly connected with the HVD Dataset.

Data publishers that solely use the indirect approach might face a undesired side effect of their choice: namely that they must encode a HVD Distribution even that is not required nor necessary. That is a consequence of the local publishers agreements but it might have policy impacts: namely also that superfluous Distribution becomes subject to the HVD IR.

Having two routes to declare the presence of a API makes the expression of the obligation as a single cardinality in the specification non trivial. To ensure that at least an API is present for a HVD Dataset the SHACL validation rules have been extended.

A similar request, although less visible, is the possibility of downloading the data in bulk. The Annex of the HVD IR requests frequently this need. Despite it is not obligatory in all cases, it is a strong recommendation. In the formal model such strong recommendations are hard to implement. Adding a minimum cardinality of 1 would enforce the presence, which is too strict. But relaxing the minimum cardinality may be an opening to not provide this information. Reviewers have pointed out this challenge. Therefore the following approach has been applied in this specification to ensure that publishers consider this need. The formal constraints have been relaxed, the usage qualification has been set to recommendation, and additional usage notes including this explanation have been added. In addition the SHACL validation rules have been extended with a check to validate if each HVD dataset has a HVD Distribution. This check has severity Violation, to force publishers to reconsider in case of missing distributions their provided information.

Forcing the presence of at least one distribution or data service for a Dataset, is also implicitly the result of the need for providing legal information. As DCAT-AP mandates this to be expressed at the level of Distributions or Data Services, a Dataset will have no legal information associated if none of them is present. This supports the case for a strong validation check.

Reporting

The HVD IR requires EU Member States (reporter) to report the list of HVD Datasets. On 24 June 2024, the European Commission (DG CONNECT) has presented a common reporting approach based on submitting metadata as DCAT-AP HVD to the Portal for European Data [[DEU]]. This approach leverages the guidelines expressed in this specification. More details on the actual approach and example SPARQL queries that can be used to assess the current state of play of a MS are found in the DEU documentation.

Example

In this section we illustrate the recommendations for DCAT-AP for the HVD implementing regulation. The examples in this section are fictitious; their sole purpose is to illustrate the metadata.

Datasets in scope of HVD

Consider that a dataset "The population of bees" is within scope of the HVD while another dataset "The population of wasps" is not. Both datasets however, are in scope of the INSPIRE directive. The dataset "The population of bees" is also tagged with the appropriate HVD Category. Observe that the solely the most granular concept "Nature preservation and biodiversity" is provided. The generic top-level category can be derived from it.

Example 1 - Bees and wasps population datasets

Both datasets are published by the Environment Agency of the EU Memberstate ExampleMS using a persistent identifier.

Example 2 - MS dataset

The datasets are published on the EU Memberstates national data portal https://dataportal.exampleMS.gov. This portal provides another identifier to the datasets. Because that new identifier is not the master identifier, the portal avoids this by sharing this identifier in its published DCAT-AP catalogue by listing it as an additional identifier.

Example 3 - MS dataset with 2 identifiers

Bulk downloads for HVD Datasets

The datasets are downloadable in various formats and level of detail. In our example, the data is available in two formats: RDF and ESRI shapefile format. According to the HVD IR the datasets must minimally be available in bulk download with the granularity of 50 square kilometres and with a bi-yearly update frequency. It must also be available in an open format for geospatial data. Based on these requirements, the publisher of the dataset decides to indicate that the shape-based distribution is a HVD bulk download.

Example 4 - MS dataset with 2 distributions

The HVD IR also specifies that the dataset should at least provide information about the number of bees, the calculation method, the amount of honey being harvested and the number of beekeepers active in the area. The publisher describes the data semantically using an application profile, and provides detailed data schema documentation for each distribution.

Example 5 - MS dataset conform to a profile

The HVD Dataset is accessible via an API

According to the HVD IR, the "bee population" dataset must be made available via an API. The dataset publisher has an API platform deployed, via which data users have access to realtime data. This API platform supports all datasets of the publisher.

Example 6 - MS dataset with data service

Because the API platform is provided as the API for the "bee population" dataset, the HVD implementing regulation requirements apply. This means that the endpoint URL must be persistent. The publisher should perform maximal effort to keep the endpoint URL stable. For instance, deploying a new API platform or changing organisation names should not impact the endpoint URL.

To provide information about the use of the API platform, the publisher provides OpenAPI technical documentation and an SLA to document the quality of service.

Example 7 - MS data service with OpenAPI and SLA

To address any questions by the users the publisher operates a service desk.

Example 8 - MS data service with publisher service desk

Expressing legal conditions

The Member State (MS) imposes, via its legislation, to use national data licences for public bodies. Therefore, the dataset publisher is required to use one of them. As support to the community, the MS has published the data licences as a SKOS taxonomy, using persistent URIs. For the "wasp population" dataset, a restrictive licence is chosen because the data is based on information that has commercial rights including fees. The "bee population" dataset is shared with a very permissive licence.

Example 9 - MS dataset distributions with different licences

In order to support the assessment of the used licences, the MS maps the licences to the NAL Licences [[NAL-Licence]].

Example 10 - Mapping licences

The HVD IR requires the licence for the Bee population dataset is at least as permissive as CC-BY 4.0. Since the Bee population licence is https://data.exampleMS.gov/resource/FreeAndOpen and it is an exact match with http://publications.europa.eu/resource/authority/licence/CC0, and this CC0 licence is more permissive than CC-BY-4.0, the HVD requirement is met.

Because in producing the RDF representation additional provenance information is included that is sensitive, the publisher changes the licence for that distribution to a more restrictive one.

Example 11 - Restricting licences

Although this restricted licence https://data.exampleMS.gov/resource/NoCommercialUseWithFees does not meet the HVD requirements for the "bee population" dataset, the "bee population" dataset is still conformant to the HVD implementing regulation as the RDF distribution was not within scope of the HVD. The same reasoning holds for the "wasp population" dataset. This illustrates the flexibility the DCAT-AP HVD specification offers to address complex and rare scenarios data publishers might face.

The Data Service exampleMS:EAMS-APIplatform provides access to both datasets. The legal conditions on the usage of the platform for the "bee population" dataset is a combination of the API platform conditions (e.g. no misuse by triggering DDOS activities, no sharing of access tokens to third parties, etc. ) and the dataset conditions. The API request https://orgea.exampleMS.gov/api/v2/beepopulation/ has thus different conditions than https://orgea.exampleMS.gov/api/v2/wasppopulation/. Therefore, the nature of the licence document, associated with a Data Service, is usually more oriented to the use of the API platform rather than to the use of the data it provides access too.

In the example the 'Terms of Use' for the API platform are mentioned as the license. In addition, the API platform can also indicate the SLA it offers.

Example 12 - Data service with terms and SLA

Reporting

The MS reports its HVD conformance status by providing a catalogue containing all metadata in scope of HVD. To facilitate the conformance assessment, it will only include the Datasets, Dataset Series, Data Services and Distributions that are in scope of HVD. The catalogue will also contain any additional supportive information such as ContactPoints, Agents and the mapping for the licences to the EU Licences.

Example 13 - MS Catalogue

To reduce the risk of misinterpretation, the Catalogue Resource connecting properties such as dcat:servesDataset and dcat:distribution should be inspected to not refer to Catalogued Resources outside the scope of HVD. In the example below, the reference to the RDF distribution for the "bee population" and the "wasp population" dataset are removed from the reporting catalogue. During the collection stage of the common reporting approach a sparql query will perform this scoping.

Example 14 - MS Catalogue in HVD scope

Based on this catalogue the MS can be audited for its conformance. During the assessment it might occur that the supplied information is not sufficient, and that the assessment must follow the references outside the supplied catalogue. E.g., when assessing the permissiveness of the licences the details of the referenced EU Licence must be consulted. Crossing these boundaries is a regular occurrence and it can be done during the assessment without impacting the results when the supplied data is based on persistent identifiers (PURIs).

The use of dereferenceable persistent identifiers could also lead to another agreement to supply a more condensed representation of the reporting catalogue. Under the condition that all catalogued resources in scope of HVD are in the Portal for European Data [[DEU]], the reporting by an MS could be reduced to sharing the minimal information how to collect these resources. In the common reporting approach the set of harvested MS catalogues containing HVD information is requested. With that all the necessary information for the reporting is extracted from the DEU.

Validation

To support the assessment if a Catalogue satisfies DCAT-AP HVD, the following SHACL templates are provided. The file full.ttl combines them all. At this moment the SHACL templates assume that all resources are subject to DCAT-AP HVD. It is future work to provide a conditional approach which would support catalogues mixing HVD datasets and non-HVD datasets.

Quick Reference of Classes and Properties

This section provides a condensed tabular overview of the mentioned classes and properties in this specification. The properties are grouped under headings mandatory, recommended, optional and deprecated. These terms have the following meaning.
ClassClass IRIProperty TypePropertyProperty IRI
Catalogue
http://www.w3.org/ns/dcat#Catalog
Recommended dataset
http://www.w3.org/ns/dcat#dataset
Catalogue
http://www.w3.org/ns/dcat#Catalog
Recommended service
http://www.w3.org/ns/dcat#service
Catalogue
http://www.w3.org/ns/dcat#Catalog
Optional record
http://www.w3.org/ns/dcat#record
Catalogue Record
http://www.w3.org/ns/dcat#CatalogRecord
Mandatory primary topic
http://xmlns.com/foaf/0.1/primaryTopic
Catalogued Resource
http://www.w3.org/ns/dcat#Resource
Concept
http://www.w3.org/2004/02/skos/core#Concept
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory applicable legislation
http://data.europa.eu/r5r/applicableLegislation
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory contact point
http://www.w3.org/ns/dcat#contactPoint
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory documentation
http://xmlns.com/foaf/0.1/page
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory endpoint URL
http://www.w3.org/ns/dcat#endpointURL
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory HVD category
http://data.europa.eu/r5r/hvdCategory
Data Service
http://www.w3.org/ns/dcat#DataService
Recommended endpoint description
http://www.w3.org/ns/dcat#endpointDescription
Data Service
http://www.w3.org/ns/dcat#DataService
Recommended serves dataset
http://www.w3.org/ns/dcat#servesDataset
Data Service
http://www.w3.org/ns/dcat#DataService
Optional licence
http://purl.org/dc/terms/license
Data Service
http://www.w3.org/ns/dcat#DataService
Optional rights
http://purl.org/dc/terms/rights
Dataset
http://www.w3.org/ns/dcat#Dataset
Mandatory applicable legislation
http://data.europa.eu/r5r/applicableLegislation
Dataset
http://www.w3.org/ns/dcat#Dataset
Mandatory HVD Category
http://data.europa.eu/r5r/hvdCategory
Dataset
http://www.w3.org/ns/dcat#Dataset
Recommended contact point
http://www.w3.org/ns/dcat#contactPoint
Dataset
http://www.w3.org/ns/dcat#Dataset
Recommended dataset distribution
http://www.w3.org/ns/dcat#distribution
Dataset
http://www.w3.org/ns/dcat#Dataset
Optional conforms to
http://purl.org/dc/terms/conformsTo
Dataset
http://www.w3.org/ns/dcat#Dataset
Optional in series
http://www.w3.org/ns/dcat#inSeries
Dataset Series
http://www.w3.org/ns/dcat#DatasetSeries
Mandatory applicable legislation
http://data.europa.eu/r5r/applicableLegislation
Dataset Series
http://www.w3.org/ns/dcat#DatasetSeries
Mandatory HVD Category
http://data.europa.eu/r5r/hvdCategory
Distribution
http://www.w3.org/ns/dcat#Distribution
Mandatory access URL
http://www.w3.org/ns/dcat#accessURL
Distribution
http://www.w3.org/ns/dcat#Distribution
Mandatory applicable legislation
http://data.europa.eu/r5r/applicableLegislation
Distribution
http://www.w3.org/ns/dcat#Distribution
Recommended access service
http://www.w3.org/ns/dcat#accessService
Distribution
http://www.w3.org/ns/dcat#Distribution
Recommended licence
http://purl.org/dc/terms/license
Distribution
http://www.w3.org/ns/dcat#Distribution
Optional linked schemas
http://purl.org/dc/terms/conformsTo
Distribution
http://www.w3.org/ns/dcat#Distribution
Optional rights
http://purl.org/dc/terms/rights
Document
http://xmlns.com/foaf/0.1/Document
Kind
http://www.w3.org/2006/vcard/ns#Kind
Recommended contact page
http://www.w3.org/2006/vcard/ns#hasURL
Kind
http://www.w3.org/2006/vcard/ns#Kind
Recommended email
http://www.w3.org/2006/vcard/ns#hasEmail
Legal Resource
http://data.europa.eu/eli/ontology#LegalResource
Licence Document
http://purl.org/dc/terms/LicenseDocument
Literal
http://www.w3.org/2000/01/rdf-schema#Literal
Resource
http://www.w3.org/2000/01/rdf-schema#Resource
Rights statement
http://purl.org/dc/terms/RightsStatement
Standard
http://purl.org/dc/terms/Standard

Acknowledgments

The editors gratefully acknowledge the contributions made to this document by all members of the working group. Especially we would like to express our gratetude for the former editor Makx Dekkers.

This work was elaborated by a Working Group under SEMIC by Interoperable Europe in collaboration with DG Environment and JRC. Interoperable Europe of the European Commission was represented by Pavlina Fragkou. DG CONNECT was represented by Jiri Pilar and Michal Kuban. Bert Van Nuffelen, Pavlina Fragkou, Jitse De Cock, and Arthur Schiltz were the editors of this specification.

Past and current contributors are : Alberto Abella , Anssi Ahlberg , Adam Arndt , Judie Attard , Julius Belickas , Nick Berkvens , Konstantis Bogucarskis , Peter Bruhn Andersen , Ewa Bukala , Martin Böhm , Nikolai Bülow Tronche , Ana Cano , Eileen Carroll , Egle Cepaitiene , Luisa Cidoncha , Marco Combetto , John Cunningham , Ine de Visser , Kelly Deirdre , Makx Dekkers , Radko Domanska , Iwona Domaszewska , Ulrika Domellöf Mattsson , Alessio Dragoni , Nicolai Draslov , Frederik Emanualsson , Jordi Escriu , Jose-Luis Fernandez-Villacanas , Nuno Freire , Leyre Garralda , Alma Gonzalez , Capser Gras , Bart Hanssens , Kieran Harper , Jasper Heide , Mika Honkanen , Peter Isrealsson , Fabian Kirstein , Michal Kitta , Jakub Klimek , Rae Knowler , Fredrik Knutsson , Peter Kochman , Sirkku Kokkola , Michal Kuban , Kaia Kulla , Maria Lenartowicz , Anja Litka , Anja Loddenkemper , Hagar Lowenthal , Melanie Mageean , Agata Majchrowska , Hugh Mangan , Estelle Maudet , Balint Miklos , Esther Minguela , Joachim Nielandt , Geraldine Nolf , Erik Obsteiner , Javier Orozco , Csapo Orsolya , Matthias Palmer , Alberto Palomo , Francesco Paolicelli , Eirini Pappi , Mihai Paunescu , Sylwia Pichlak Pawlak , Jiri Pilar , Ludger Rinsche , Daniele Rizzi , Joeri Robbrecht , Reet Roosalu , Ana Rosa , Maik Roth , Antonio Rotundo , Michal Ruzicka , Jill Saligoe-Simmel , Fabian Santi , Giovanna Scaglione , Charles-Andrew Vande Catsyne Sciensano , Giampaolo Selitto , Martin Semberger , Paulo Seromenho , Jan Skornsek , Michele Spichtig , Emidio Stani , Kjersti Steien , Simon Steuer , Terje Sylvarnes , Martin Traunmuller , Kees Trautwein , Stavros Tsouderos , Thomas Tursics , Bert Van Nuffelen , Uwe Voges , Gabriella Wiersma , Jesper Zedlitz , Mantas Zimnickas .