Introduction

Context

In light of the growing importance of data, the European Commission has adopted an implementing act focused on High-Value Datasets on 21 December 2022 [[HVD]]. The implementing regulation groups the datasets in a list of six High-Value Datasets thematic categories: geospatial, earth observation and environment, meteorological, statistics, companies and company ownership, and mobility. The Portal for European Data [[DEU]] has published an easy-to-read overview.

Scope

This document provides the guidelines on how to use DCAT-AP for a dataset that is subject to the requirements imposed by the High-Value Dataset implementing regulation (HVD IR). The document is called the "usage guidelines of DCAT-AP for High-Value Datasets", in short DCAT-AP HVD.

To understand these guidelines, it is important to realise that the HVD IR applies to a subset of all the datasets that are collected by (Open) Data Portals in Europe. A single catalogue contains catalogued resources which are within and outside scope of the HVD IR.

This document supports the need for a common usage of DCAT-AP for catalogued resources within the scope of the High-Value Dataset implementing regulation. When conforming to these guidelines, a dataset within the scope of the regulation will have satisfied the minimum metadata requirements to be included in the mandatory reporting of the regulation. It, however, does not mean an automated compliance with the regulation because certain aspects are beyond DCAT-AP. For instance, the regulation imposes certain data aspects to be present, or the licencing requires to be more permissive than CC-BY 4.0. Such requirements cannot be verified by just inspecting the DCAT-AP description, but the DCAT-AP description will assist in the assessment.

DCAT-AP HVD supports the implementers of the regulation in their assessment of their state of play. When applying the guidelines to the metadata of their datasets, which are in scope of the regulation, the necessary attention will be raised to drive towards conformance. At the same time, this effort will create an immediate benefit for the European citizens and businesses. Any improvement of the metadata will immediately flow throughout the European network of (Open) Data Portals and thus increase the level of metadata quality.

Meeting minutes

Status

This application profile has the status SEMIC Recommendation published at 2023-12-14.

Information about the process and the decisions involved in the creation of this specification are consultable at the Changelog.

License

Copyright © 2023 European Union. All material in this repository is published under the license CC-BY 4.0, unless explicitly otherwise mentioned.

Conformance Statement

In order for applications to conform to DCAT-AP HVD, it MUST conform to DCAT-AP. In addition, the application must conform to the mentioned constraints and usage guidelines following similar conformance statements as specified in DCAT-AP.

Provider requirements

In order to conform to this Application Profile, an application that provides metadata MUST: The application of the controlled vocabularies as described in section [[[#controlled-vocs]]].

Receiver requirements

In order to conform to this Application Profile, an application that receives metadata MUST be able to: "Processing" means that receivers must accept incoming data and transparently provide these data to applications and services. It does neither imply nor prescribe what applications and services finally do with the data (parse, convert, store, make searchable, display to users, etc.).

Terminology

An Application Profile is a specification that reuses terms from one or more base standards, adding more specificity by identifying mandatory, recommended and optional elements to be used for a particular application, as well as recommendations for controlled vocabularies to be used.

An Annex to an Application Profile is a specification that precises the use of some aspects of the Application Profile for a specific context.

This specification uses the following prefixes to shorten the URIs for readibility.
PrefixNamespace IRI
admshttp://www.w3.org/ns/adms#
dcathttp://www.w3.org/ns/dcat#
dcataphttp://data.europa.eu/r5r/
dcthttp://purl.org/dc/terms/
foafhttp://xmlns.com/foaf/0.1/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfshttp://www.w3.org/2000/01/rdf-schema#
skoshttp://www.w3.org/2004/02/skos/core#
vcardhttp://www.w3.org/2006/vcard/ns#
xsdhttp://www.w3.org/2001/XMLSchema#

Overview

DCAT-AP HVD is an annex to DCAT-AP. It describes additional usage of the DCAT-AP to satisfy the High-Value Dataset implementing regulation (HVD IR). In this document only the additional information, that is required for the catalogued resources which are within scope of the regulation, is included. In any other case, the guidelines of DCAT-AP itself are applicable. To make this more visible for the reader, cross-references have been included to DCAT-AP.

As the publication URL indicates this version of DCAT-AP HVD is based on DCAT-AP 2. Nevertheless the requirements expressed in this Annex are also compatible with the up-comming release of DCAT-AP 3. When DCAT-AP 3 is released also an updated version of DCAT-AP HVD will be published.

Application profile diagram

An overview of DCAT-AP HVD is shown by the UML diagram below. The UML diagram illustrates the specification described in this document. For readability purposes the representation has been condensed as follows:

The cardinalities and qualifications are included in the figure.

For readibility of this document as an annex to DCAT-AP, the core relationships between classes are included.

This document describes the usage of the following main entities for a correct usage of the Application Profile:
| Catalogue | Catalogue Record | Catalogued Resource | Data Service | Dataset | Distribution | Kind | Licence Document |

The main entities are supported by:
| Concept | Document | Legal Resource | Literal | Resource | Rights statement | Standard |

Main Entities

The main entities are those that form the core of the Application Profile. The properties and their associated constraints that apply in the context of this profile are listed in a tabular form. Each row corresponds to one property. In addition to the constraints also cross-references are provided to DCAT and DCAT-AP. For the last, to save space, the following abbreviations are used: This reuse qualification assessement is w.r.t. a specific version of DCAT-AP. Therefore it may vary over time when new versions of DCAT-AP are created.

Catalogue

Definition
A catalogue or repository that hosts the Datasets or Data Services being described.
Reference in DCAT
Link
Properties
For this entity the following properties are defined: dataset , record , service .
Property Range Card Definition Usage DCAT Reuse
dataset Dataset 0..* A Dataset that is part of the Catalogue. As empty Catalogues are usually indications of problems, this property should be combined with the next property service to implement an empty Catalogue check. Link A
record Catalogue Record 0..* A Catalogue Record that is part of the Catalogue Link A
service Data Service 0..* A site or end-point (Data Service) that is listed in the Catalogue. As empty Catalogues are usually indications of problems, this property should be combined with the previous property dataset to implement an empty Catalogue check. Link A

Catalogue Record

Definition
A description of a Dataset's entry in the Catalogue.
Reference in DCAT
Link
Properties
For this entity the following properties are defined: primary topic .
Property Range Card Definition Usage DCAT Reuse
primary topic Catalogued Resource 1 A link to the Dataset, Data service or Catalog described in the record. A catalogue record will refer to one entity in a catalogue. This can be either a Dataset or a Data Service. To ensure an unambigous reading of the cardinality the range is set to Catalogued Resource. However it is not the intend with this range to require the explicit use of the class Catalogued Record. As abstract class, an subclass should be used. Link A

Catalogued Resource

Definition
Resource published or curated by a single agent.
Reference in DCAT
Link
Usage Note
For DCAT-AP, the class is considered an abstract notion.
Properties
This specification does not impose any additional requirements to properties for this entity.

Data Service

Definition
A collection of operations that provides access to one or more datasets or data processing functions.
Reference in DCAT
Link
Subclass of
Catalogued Resource
Properties
For this entity the following properties are defined: applicable legislation , contact point , documentation , endpoint description , endpoint URL , HVD category , licence , rights , serves dataset .
Property Range Card Definition Usage DCAT Reuse
applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Data Service. For HVD the value MUST include the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj.
As multiple legislations may apply to the resource the maximum cardinality is not limited.
P
contact point Kind 1..* Contact information that can be used for sending comments about the Data Service. Article 3.4 requires the designation of a point of contact for an API. Link P
documentation Document 1..* A page that provides additional information about the Data Service. Quality of service covers a broad spectrum of aspects. The HVD regulation does not list any mandatory topic. Therefore quality of service information is considered part of the generic documentation of a Data Service. P
endpoint description Resource 0..* A description of the services available via the end-points, including their operations, parameters etc. The property gives specific details of the actual endpoint instances, while dct:conformsTo is used to indicate the general standard or specification that the endpoints implement.
Article 3.3 requires to provide API documentation in a Union or internationally recognised open, human-readable and machine-readable format.
Link E
endpoint URL Resource 1..* The root location or primary endpoint of the service (an IRI). The endpoint URL SHOULD be persistent. This means that publishers should do everything in their power to maintain the value stable and existing. Link E
HVD category Concept 1..* The HVD category to which this Data Service belongs. P
licence Licence Document 0..1 A licence under which the Data service is made available. Article 3.3 specifies that the terms of use should be provided. According to the guidelines for legal Information in DCAT-AP HVD this is fullfilled by providing by preference a licence. As alternative rights can be used. Link E
rights Rights statement 0..* A statement that specifies rights associated with the Distribution. Article 3.3 specifies that the terms of use should be provided. According to the guidelines for legal Information in DCAT-AP HVD this is fullfilled by providing by preference a licence. As alternative rights can be used. P
serves dataset Dataset 1..* This property refers to a collection of data that this data service can distribute. An API in the context of HVD is not a standalone resource. It is used to open up HVD datasets. Therefore each Data Service is at least tightly connected with a Dataset. Link E

Dataset

Definition
A conceptual entity that represents the information published.
Reference in DCAT
Link
Subclass of
Catalogued Resource
Properties
For this entity the following properties are defined: applicable legislation , conforms to , contact point , dataset distribution , HVD Category .
Property Range Card Definition Usage DCAT Reuse
applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Dataset. For HVD the value must include the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj.
As multiple legislations may apply to the resource the maximum cardinality is not limited.
P
conforms to Standard 0..* An implementing rule or other specification. The provided information should enable to the verification whether the detailed information requirements by the HVD is satisfied. For more usage suggestions see section on specific data requirements. Link A
contact point Kind 0..* Contact information that can be used for sending comments about the Dataset. Link A
dataset distribution Distribution 1..* An available Distribution for the Dataset. The HVD IR is a quality improvement of existing datasets. The intention is that HVD datasets are publicly and open accessible. Therefore a Distribution is expected to be present. (Article 3.1) Link A
HVD Category Concept 1..* The HVD category to which this Dataset belongs. P

Distribution

Definition
A physical embodiment of the Dataset in a particular format.
Reference in DCAT
Link
Usage Note
Bulk downloads should be encoded as a Distribution.
Properties
For this entity the following properties are defined: access service , access URL , applicable legislation , licence , linked schemas , rights .
Property Range Card Definition Usage DCAT Reuse
access service Data Service 0..* A data service that gives access to the distribution of the dataset Link A
access URL Resource 1..* A URL that gives access to a Distribution of the Dataset. The resource at the access URL contains information about how to get the Dataset. In accordance to the DCAT guidelines it is preferred to also set the downloadURL property if the URL is a reference to a downloadable resource. Link A
applicable legislation Legal Resource 1..* The legislation that mandates the creation or management of the Distribution For HVD the value must include the ELI http://data.europa.eu/eli/reg_impl/2023/138/oj.
As multiple legislations may apply to the resource the maximum cardinality is not limited.
P
licence Licence Document 0..1 A licence under which the Distribution is made available. Article 4.3 specifies that High-value datasets should be made available for reuse. According to the guidelines for legal Information in DCAT-AP HVD this is fullfilled by providing by preference a licence. As alternative rights can be used. Link E
linked schemas Standard 0..* An established schema to which the described Distribution conforms. The provided information should enable to the verification whether the detailed information requirements by the HVD is satisfied. For more usage suggestions see section on specific data requirements. Link A
rights Rights statement 0..* A statement that specifies rights associated with the Distribution. Article 4.3 specifies that High-value datasets should be made available for reuse. According to the guidelines for legal Information in DCAT-AP HVD this is fullfilled by providing by preference a licence. As alternative rights can be used. Link E

Kind

Definition
A description following the vCard specification, e.g. to provide telephone number and e-mail address for a contact point.
Usage Note
Article3.4 requires to designate a point of contact for an API. It is recommended to provide at least either an email or a contact form from e.g. a service desk.
Properties
For this entity the following properties are defined: contact page , email .
Property Range Card Definition Usage DCAT Reuse
contact page Resource 0..1 A webpage that either allows to make contact (i.e. a webform) or the information contains how to get into contact.  P
email Resource 0..1 A email address via which contact can be made. P

Licence Document

Definition
A legal document giving official permission to do something with a resource.
Usage Note
The HVD regulation requires a machine readable representation of a Licence. The minimal data model to describe a licence Document is beyond this specification. Nevertheless in [[[#c3]]] some suggestions are made.
Properties
This specification does not impose any additional requirements to properties for this entity.

Supportive Entities

The supportive entities are supporting the main entities in the Application Profile. They are included in the Application Profile because they form the range of properties.

Concept

Definition
Properties
This specification does not impose any additional requirements to properties for this entity.

Document

Definition
A textual resource intended for human consumption that contains information, e.g. a web page about a Dataset.
Properties
This specification does not impose any additional requirements to properties for this entity.

Legal Resource

Definition
This class represents the legislation, policy or policies that lie behind the Rules that govern the service.
Usage Note
The definition and properties of the Legal Resource class are aligned with the ontology included in "Council conclusions inviting the introduction of the European Legislation Identifier (ELI)". For describing the attributes of a Legal Resource (labels, preferred labels, alternative labels, definition, etc.) we refer to the ELI ontology. In this data specification the use is restricted to instances of this class that follow the ELI URI guidelines.
Properties
This specification does not impose any additional requirements to properties for this entity.

Literal

Definition
A literal value such as a string or integer; Literals may be typed, e.g. as a date according to xsd:date. Literals that contain human-readable text have an optional language tag as defined by BCP 4715 [[rfc5646]].
Properties
This specification does not impose any additional requirements to properties for this entity.

Resource

Definition
Anything described by RDF.
Properties
This specification does not impose any additional requirements to properties for this entity.

Rights statement

Definition
A statement about the intellectual property rights (IPR) held in or over a resource, a legal document giving official permission to do something with a resource, or a statement about access rights.
Properties
This specification does not impose any additional requirements to properties for this entity.

Standard

Definition
A standard or other specification to which a Dataset or Distribution conforms.
Properties
This specification does not impose any additional requirements to properties for this entity.

Controlled Vocabularies

The usage of controlled vocabularies in DCAT-AP HVD conforms and extends the usage defined by DCAT-AP. In addition, the following controlled vocabularies are defined to be used:

Controlled vocabularies to be used

In the table below, a number of properties are listed with controlled vocabularies that MUST be used for the listed properties. The declaration of the following controlled vocabularies as mandatory ensures a minimum level of interoperability.
Property URIUsed for ClassVocabulary nameVocabulary URIUsage note
dcatap:hvdCategoryDatasetEU Vocabularies HVD Categorieshttp://data.europa.eu/bna/asd487ae75
dcatap:hvdCategoryData ServiceEU Vocabularies HVD Categorieshttp://data.europa.eu/bna/asd487ae75

Licence controlled vocabularies

The HVD IR imposes quality requirements on the published legal conditions. In line with the generic DCAT-AP guidelines for publishing controlled vocabularies, a licence controlled vocabulary SHOULD:

Mapping the HVD IR to DCAT-AP

This section provides recommendations how to encode descriptions required by the HVD implementation regulation (HVD IR) as a DCAT-AP metadata structure. Each topic is introduced first from the perspective of the HVD IR, followed by an assessment of the topic on the use of DCAT-AP. The selected interpretation is further elaborated, where appropriate, with implementation guidelines.

Alignment of terminology

The HVD implementation regulation uses the terms Dataset, Bulk Download and API.

In the context of DCAT-AP, a HVD Dataset is mapped on a Dataset, Bulk Download on a Distribution and API on a Data Service. To be conformant with the use of DCAT-AP in the context of the HVD IR, this mapping MUST be followed.

To make the text easier to read, with a HVD Dataset we mean a Dataset in scope of the HVD implementing regulation. The same pattern is applied to other entities.

In scope of HVD IR

A Dataset is a HVD Dataset if and only if a MS has included it in its reporting. The HVD IR defines High-Value Datasets. It may be possible that the same definition applies to multiple entities. In that case, a Member State should select the most appropriate one, according to the rules in the regulation. If the Member State decides to include multiple entities in the reporting, the requirements set out in the HVD IR will apply to all these entities. Also, if a Member State decides to include a dataset in the HVD reporting for which inclusion is not mandatory, then the requirements of the HVD IR will apply. The report is an engagement of the Member State to the European data community to sustain those datasets.

If a re-user discovers a dataset that seems to be in scope of the HVD IR, then the responsible MS should be able to provide an explanation why it is not included in the reporting. One response to this question could be by providing the relevant HVD Dataset corresponding to that dataset.

Denoting a HVD Dataset

Each entity (Dataset, Data Service, Distribution, Catalogue) that is identified by a MS in scope of the HVD IR should provide the European Legislation Identifier (ELI) http://data.europa.eu/eli/reg_impl/2023/138/oj of the HVD IR for the property applicable legislation. For the reporting, a Member State can provide a catalogue containing all elements that are within scope for the reporting of the HVD IR. In that case the catalogue should also set the value for the property applicable legislation to the ELI of the HVD.

Special cases

When a Dataset is within scope of HVD, it is not mandatory that all distributions are within scope of HVD. Existing metadata remains valid. Our recommendations ensure that existing metadata (specified in DCAT-AP or other frameworks like INSPIRE) remains valid. Becoming a Dataset in scope of HVD is an additional operation.

When a Data Service offers access to multiple datasets and this Data Service fulfils the HVD requirements (e.g. the HVD API for that dataset) for a HVD then the HVD requirements apply only to that HVD. It is common that the same API service endpoint (denoted by a dcat:DataService) provides access to multiple datasets. As such, it is to be expected that only some of the datasets are within scope of HVD. Like for Distributions, the HVD does not enforce that all Datasets associated with a Data Service must be in scope of HVD. Nevertheless, it must be noted that the HVD requirements on a Data Service might indirectly impact the other datasets that are available through the same data service, because a Data Service will share the operational and service level requirements for all its associated datasets.

HVD data category

The HVD IR defines six thematic data categories: geospatial, earth observation and environment, meteorological, statistics, companies and company ownership, and mobility. A new property HVD category is introduced to indicate the HVD category to which an resource, i.e. a dataset, belongs. The controlled vocabulary with all possible values is maintained by the Publications Office. A resource may belong to more than one data category.

Identifiers

In general, the requirements of the HVD IR are satisfied when the best practices of DCAT-AP on identifiers are followed. According to HVD IR the identifiers provided in the report should be an online reference to the metadata.

In short these are:

In practice, multiple identifiers may have been assigned to a Dataset. It is recommended to select a master identifier and use this one to implement the HVD IR. In general, harvesters and portals are advised to use and promote this master identifier as the identifier for the HVD Dataset. In addition it is recommended to augment the list of other identifiers with the encountered identifiers. These identifier processing recommendations are made to ensure that the information in usages like the HVD reporting (i.e. a reference to a dataset) is with the published metadata on data portals.

Persistent identifiers

The HVD IR requires as part of the reporting requirements (article 5.3), that Licensing Conditions and APIs have persistent links.

Persistence means that, for these entities, Member States take the responsibility to maintain the real world resource indefinitely and additionally reduce the accessibility challenge by maintaining the same name for that real world resource indefinitely. Thus for the entities that MSs include in the reporting and for which the reporting requires a persistent link, a MS makes a persistent commitment.

As DCAT-AP is a Semantic Web data specification, persistence is associated with the use persistent URIs (PURIs) for the metadata descriptions. A general advice for DCAT-AP implementers is to use PURIs for all entities, but mostly for Datasets and Data Services. The practice, though, shows that this is not universally applied. To reduce this gap between the intention and the practice DCAT-AP has proposed a number of guidelines on identifiers [[IdentifierGuidelines]]. Implementers of the HVD IR are advised to read these guidelines to understand how identity might or might not be preserved from one data portal to another, and take the appropriate actions.

In article 5.3 of the HVD IR, the broad term Licensing Conditions is used, while in the other parts of the regulation the term Licence is used. DCAT-AP provides several means to express legal information, notably the properties licence (dct:license) and rights (dct:rights). This may lead to questions whether rights are included by the reporting requirement. As the final objective is to provide a trusted legal statement, it is considered that the requirement for a persistent link applies to rights too.

The reporting requirement for a persistent link for APIs is ambiguous from the perspective of DCAT-AP. In DCAT-AP, there is the identifier for the Data Service, i.e. the description about the API, and the property endpoint URL, which is the technical endpoint via which the data exchange will happen. The impact that the persistency requirement has on each is different and requires special attention from the HVD Dataset publisher. As the HVD IR does not specify precisely which case it covers, both are considered in scope. That means that a Data Service has a PURI and that its endpoint URL is persistent.

DCAT-AP does not impose persistent identification of an endpoint URL. It, however, expects a life-cycle management of the API through metadata. For that, DCAT-AP recommends to follow the DCAT guidelines on Resource life-cycle.

For example, consider an API which is at the end of its lifecycle. According to DCAT(-AP), the PURI of the Data Service could get the status ‘deprecated’ and the endpoint URL could be made void when it is taken offline. Any data portal user would understand that this Data Service should not to used anymore. If the metadata is augmented with the information about the successor of the Data Service, the data portal owner can be guided to the new Data Service.

The impact for a user of the endpoint URL is higher: systems might get broken when the endpoint URL is taken offline. This situation is the result of a shared responsibility: either the publishers did not apply a decent life cycle management, or the users did not inform the publisher about their critical dependency. Because of this, even if the API gives open access to open data, users that are dependent on the API are advised to inform the publisher about their existance. But also, publishers must improve their life cycle management for these APIs so that re-users get the right information and can take the change of the endpoint of the API into account in their roadmaps.

The enforcement of a persistent link for the endpoint URL will reduce the occurrence of such cases, but it will not make them disappear. And this enforcement imposes additional care on the Data Services (APIs) by HVD publishers. When an API is moved to a new platform (e.g. from a local API gateway to an organisation-wide one), the original endpoint URL must be maintained, and also the metadata management must be maintained.

In summary, the recommendation is to have persistency for both aspects of the API: its metadata identifier as its endpoint URL.

Legal information

The HVD IR requires a high level of metadata quality for legal information. The information should be provided in machine and human readable format, using a persistent link. Furthermore, it should be possible to investigate whether the legal conditions are equal or more permissive than the reference CC-BY 4.0.

Despite these strong requirements, the HVD IR does not alter the recommendations and practice of expressing legal information in DCAT-AP. The HVD requirements do extend or precise how the legal information technically should be provided. In the DCAT-AP legal information corresponds to licences as rights expressions. In currently allowed practice, licence information may thus be supplied by a collection of rights statements, in cases that national legislation does not allow to provide a licence document. This is compatible with the HVD IR, and in that case, the HVD requirements will also apply to the rights statements. HVD IR also does not force to adapt the current DCAT-AP principle to indicate the legal information at the most precise level in the metadata description: i.e., Data Service and Distribution, therefore this principle is maintained.

Catalogue owners are advised to assess the legal information provided by the publishers according to flows in the figures below. For instance, if a publisher provides licence information referring to a licence document made online accessible by the publisher itself, then the publisher of that information must implement the HVD quality requirements for licence documents. The decision trees in the figures allow to assess whether or not additional effort has to be performed.

The decision tree for licence information.
The decision tree for licence information.

To support the assessment whether the assigned legal conditions are equal or more permissive than the reference CC-BY 4.0, the recommendation is to augment the machine-readable publication of MS-specific or publisher-specific licences with mapping information on the Licence NAL. It is recommended to use, in order of preference, the SKOS mapping properties, owl:sameAs or rdfs:seeAlso, to express this mapping.

In the reporting requirements of the HVD IR, the notion terms of use is used. It has been agreed, by the Working Group for DCAT-AP HVD, that providing terms of use information is the same as providing legal information for a Data Service.

Contact Point

The HVD IR request a contact point for APIs.

This requirement is implemented as the following recommendation. A contact point is mandatory for HVD Data Services and recommended for HVD Datasets either in the form of a (persistent) email address or a link to a contact form on a webpage, e.g. to contact a service desk.

Specific data requirements

The HVD implementation regulation describes, in its Annex, precisely the data elements that should be provided for a HVD Dataset. A HVD Dataset must conform to the rules defined in the HVD IR.

It is recommended to provide a reference to a public document (for instance: data standards) that describes the internals of the Dataset (or Distribution) using the property conforms to. This ensures that the information is made publicly accessible for reusers. It can be used by experts to verify if the Dataset matches the HVD requirements.

An alternative approach is the use of a self declaration of conformance. In this case the publisher of the HVD Dataset declares itself that it conforms to all data technical details the HVD IR imposes. The INSPIRE community has used this approach before. As the assessment of the validity of such self-declaration is a domain specific activity requiring expert knowledge, DCAT-AP HVD does not proposes a general self declaration statement. But leaves the design of such statement to the data experts of the HVD categories to provide an trustable approach.

Reporting

The HVD IR requires EU Member States (reporter) to report the list of HVD Datasets. This objective can be achieved by providing to the European Commission (reporting authority) a catalogue containing all the metadata about all the Datasets, Distributions and Data Services that are in scope of the HVD. When a MS has all metadata collected in a national data catalogue, then the report can be created by querying the national catalogue for all entities that have http://data.europa.eu/eli/reg_impl/2023/138/oj as applicable legislation. If the MS has assigned persistent identifiers, as explained in [[[#c5]]], to the metadata entities, then it is even possible for the reporting authority to collect the metadata by querying the MS national catalogue or even the Portal for European Data [[DEU]]. This potential shows that this documentation and consensus reached during the Working Group for this specification aid in reducing the aggregation effort at MS level while at the same time re-enforcing the existing metadata practices. However, the used approach (format and process) for the reporting is beyond scope of this document.

Example

In this section we illustrate the recommendations for DCAT-AP for the HVD implementing regulation. The examples in this section are fictitious; their sole purpose is to illustrate the metadata.

Datasets in scope of HVD

Consider that a dataset "The population of bees" is within scope of the HVD while another dataset "The population of wasps" is not. Both datasets however, are in scope of the INSPIRE directive.

Example 1 - Bees and wasps population datasets

Both datasets are published by the Environment Agency of the EU Memberstate ExampleMS using a persistent identifier.

Example 2 - MS dataset

The datasets are published on the EU Memberstates national data portal https://dataportal.exampleMS.gov. This portal provides another identifier to the datasets. Because that new identifier is not the master identifier, the portal avoids this by sharing this identifier in its published DCAT-AP catalogue by listing it as an additional identifier.

Example 3 - MS dataset with 2 identifiers

Bulk downloads for HVD Datasets

The datasets are downloadable in various formats and level of detail. In our example, the data is available in two formats: RDF and ESRI shapefile format. According to the HVD IR the datasets must minimally be available in bulk download with the granularity of 50 square kilometres and with a bi-yearly update frequency. It must also be available in an open format for geospatial data. Based on these requirements, the publisher of the dataset decides to indicate that the shape-based distribution is a HVD bulk download.

Example 4 - MS dataset with 2 distributions

The HVD IR also specifies that the dataset should at least provide information about the number of bees, the calculation method, the amount of honey being harvested and the number of beekeepers active in the area. The publisher describes the data semantically using an application profile, and provides detailed data schema documentation for each distribution.

Example 5 - MS dataset conform to a profile

The HVD Dataset is accessible via an API

According to the HVD IR, the "bee population" dataset must be made available via an API. The dataset publisher has an API platform deployed, via which data users have access to realtime data. This API platform supports all datasets of the publisher.

Example 6 - MS dataset with data service

Because the API platform is provided as the API for the "bee population" dataset, the HVD implementing regulation requirements apply. This means that the endpoint URL must be persistent. The publisher should perform maximal effort to keep the endpoint URL stable. For instance, deploying a new API platform or changing organisation names should not impact the endpoint URL.

To provide information about the use of the API platform, the publisher provides OpenAPI technical documentation and an SLA to document the quality of service.

Example 7 - MS data service with OpenAPI and SLA

To address any questions by the users the publisher operates a service desk.

Example 8 - MS data service with publisher service desk

Expressing legal conditions

The Member State (MS) imposes, via its legislation, to use national data licences for public bodies. Therefore, the dataset publisher is required to use one of them. As support to the community, the MS has published the data licences as a SKOS taxonomy, using persistent URIs. For the "wasp population" dataset, a restrictive licence is chosen because the data is based on information that has commercial rights including fees. The "bee population" dataset is shared with a very permissive licence.

Example 9 - MS dataset distributions with different licences

In order to support the assessment of the used licences, the MS maps the licences to the NAL Licences [[NAL-Licence]].

Example 10 - Mapping licences

The HVD IR requires the licence for the Bee population dataset is at least as permissive as CC-BY 4.0. Since the Bee population licence is https://data.exampleMS.gov/resource/FreeAndOpen and it is an exact match with http://publications.europa.eu/resource/authority/licence/CC0, and this CC0 licence is more permissive than CC-BY-4.0, the HVD requirement is met.

Because in producing the RDF representation additional provenance information is included that is sensitive, the publisher changes the licence for that distribution to a more restrictive one.

Example 11 - Restricting licences

Although this restricted licence https://data.exampleMS.gov/resource/NoCommercialUseWithFees does not meet the HVD requirements for the "bee population" dataset, the "bee population" dataset is still conformant to the HVD implementing regulation as the RDF distribution was not within scope of the HVD. The same reasoning holds for the "wasp population" dataset. This illustrates the flexibility the DCAT-AP HVD specification offers to address complex and rare scenarios data publishers might face.

The Data Service exampleMS:EAMS-APIplatform provides access to both datasets. The legal conditions on the usage of the platform for the "bee population" dataset is a combination of the API platform conditions (e.g. no misuse by triggering DDOS activities, no sharing of access tokens to third parties, etc. ) and the dataset conditions. The API request https://orgea.exampleMS.gov/api/v2/beepopulation/ has thus different conditions than https://orgea.exampleMS.gov/api/v2/wasppopulation/. Therefore, the nature of the licence document, associated with a Data Service, is usually more oriented to the use of the API platform rather than to the use of the data it provides access too.

In the example the 'Terms of Use' for the API platform are mentioned as the license. In addition, the API platform can also indicate the SLA it offers.

Example 12 - Data service with terms and SLA

Reporting

The MS reports its HVD conformance status by providing a catalogue containing all metadata in scope of HVD. To facilitate the conformance assessment, it will only include the Datasets, Data Services and Distributions that are in scope of HVD. The catalogue will also contain any additional supportive information such as ContactPoints, Agents and the mapping for the licences to the EU Licences.

Example 13 - MS Catalogue

To reduce the risk of misinterpretation, the Catalogue Resource connecting properties such as dcat:servesDataset and dcat:distribution should be inspected to not refer to Catalogued Resources outside the scope of HVD. In the example below, the reference to the RDF distribution for the "bee population" and the "wasp population" dataset are removed from the reporting catalogue.

Example 14 - MS Catalogue in HVD scope

Based on this catalogue the MS can be audited for its conformance. During the assessment it might occur that the supplied information is not sufficient, and that the assessment must follow the references outside the supplied catalogue. E.g., when assessing the permissiveness of the licences the details of the referenced EU Licence must be consulted. Crossing these boundaries is a regular occurrence and it can be done during the assessment without impacting the results when the supplied data is based on persistent identifiers (PURIs).

The use of dereferenceable persistent identifiers could also lead to another agreement to supply a more condensed representation of the reporting catalogue. Under the condition that all catalogued resources in scope of HVD are in the Portal for European Data [[DEU]], then the MS could simply supply the reduced catalogue as:

Example 15 - MS Catalogue reduced
This illustrates that when the dataset publishers provide the necessary information and this is well integrated in the network of sharing metadata through the MS to the Portal for European Data, the data exchange for the reporting can be reduced to a minimum.

Validation

To support the assessment if a Catalogue satisfies DCAT-AP HVD, the following SHACL templates are provided. At this moment the SHACL templates assume that all resources are subject to DCAT-AP HVD. It is future work to provide a conditional approach which would support catalogues mixing HVD datasets and non-HVD datasets.

Quick Reference of Classes and Properties

This section provides a condensed tabular overview of the mentioned classes and properties in this specification. The properties are grouped under headings mandatory, recommended, optional and deprecated. These terms have the following meaning.
ClassClass IRIProperty TypePropertyProperty IRI
Catalogue
http://www.w3.org/ns/dcat#Catalog
Recommended dataset
http://www.w3.org/ns/dcat#dataset
Catalogue
http://www.w3.org/ns/dcat#Catalog
Recommended service
http://www.w3.org/ns/dcat#service
Catalogue
http://www.w3.org/ns/dcat#Catalog
Optional record
http://www.w3.org/ns/dcat#record
Catalogue Record
http://www.w3.org/ns/dcat#CatalogRecord
Mandatory primary topic
http://xmlns.com/foaf/0.1/primaryTopic
Catalogued Resource
http://www.w3.org/ns/dcat#Resource
Concept
http://www.w3.org/2004/02/skos/core#Concept
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory applicable legislation
http://data.europa.eu/r5r/applicableLegislation
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory contact point
http://www.w3.org/ns/dcat#contactPoint
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory documentation
http://xmlns.com/foaf/0.1/Page
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory endpoint URL
http://www.w3.org/ns/dcat#endpointURL
Data Service
http://www.w3.org/ns/dcat#DataService
Mandatory HVD category
http://data.europa.eu/r5r/hvdCategory
Data Service
http://www.w3.org/ns/dcat#DataService
Recommended endpoint description
http://www.w3.org/ns/dcat#endpointDescription
Data Service
http://www.w3.org/ns/dcat#DataService
Recommended serves dataset
http://www.w3.org/ns/dcat#servesDataset
Data Service
http://www.w3.org/ns/dcat#DataService
Optional licence
http://purl.org/dc/terms/license
Data Service
http://www.w3.org/ns/dcat#DataService
Optional rights
http://purl.org/dc/terms/rights
Dataset
http://www.w3.org/ns/dcat#Dataset
Mandatory applicable legislation
http://data.europa.eu/r5r/applicableLegislation
Dataset
http://www.w3.org/ns/dcat#Dataset
Mandatory HVD Category
http://data.europa.eu/r5r/hvdCategory
Dataset
http://www.w3.org/ns/dcat#Dataset
Recommended contact point
http://www.w3.org/ns/dcat#contactPoint
Dataset
http://www.w3.org/ns/dcat#Dataset
Recommended dataset distribution
http://www.w3.org/ns/dcat#distribution
Dataset
http://www.w3.org/ns/dcat#Dataset
Optional conforms to
http://purl.org/dc/terms/conformsTo
Distribution
http://www.w3.org/ns/dcat#Distribution
Mandatory access URL
http://www.w3.org/ns/dcat#accessURL
Distribution
http://www.w3.org/ns/dcat#Distribution
Mandatory applicable legislation
http://data.europa.eu/r5r/applicableLegislation
Distribution
http://www.w3.org/ns/dcat#Distribution
Recommended licence
http://purl.org/dc/terms/license
Distribution
http://www.w3.org/ns/dcat#Distribution
Optional access service
http://www.w3.org/ns/dcat#accessService
Distribution
http://www.w3.org/ns/dcat#Distribution
Optional linked schemas
http://purl.org/dc/terms/conformsTo
Distribution
http://www.w3.org/ns/dcat#Distribution
Optional rights
http://purl.org/dc/terms/rights
Document
http://xmlns.com/foaf/0.1/Document
Kind
http://www.w3.org/2006/vcard/ns#Kind
Recommended contact page
http://www.w3.org/2006/vcard/ns#hasURL
Kind
http://www.w3.org/2006/vcard/ns#Kind
Recommended email
http://www.w3.org/2006/vcard/ns#hasEmail
Legal Resource
http://data.europa.eu/eli/ontology#LegalResource
Licence Document
http://purl.org/dc/terms/LicenseDocument
Literal
http://www.w3.org/2000/01/rdf-schema#Literal
Resource
http://www.w3.org/2000/01/rdf-schema#Resource
Rights statement
http://purl.org/dc/terms/RightsStatement
Standard
http://purl.org/dc/terms/Standard

Acknowledgments

The editors gratefully acknowledge the contributions made to this document by all members of the working group. This work was elaborated by a Working Group under SEMIC by Interoperable Europe. Interoperable Europe of the European Commission was represented by Pavlina Fragkou and Seth Van Hooland. Natasa Sofou, Makx Dekkers and Bert Van Nuffelen were the editors of the specification. Past and current contributors are : Alberto Abella , Anssi Ahlberg , Adam Arndt , Judie Attard , Julius Belickas , Nick Berkvens , Konstantis Bogucarskis , Peter Bruhn Andersen , Ewa Bukala , Martin Böhm , Nikolai Bülow Tronche , Ana Cano , Eileen Carroll , Egle Cepaitiene , Luisa Cidoncha , Marco Combetto , John Cunningham , Jitse De Cock , Ine de Visser , Kelly Deirdre , Makx Dekkers , Radko Domanska , Iwona Domaszewska , Ulrika Domellöf Mattsson , Alessio Dragoni , Nicolai Draslov , Frederik Emanualsson , Jordi Escriu , Jose-Luis Fernandez-Villacanas , Nuno Freire , Leyre Garralda , Alma Gonzalez , Capser Gras , Bart Hanssens , Kieran Harper , Jasper Heide , Mika Honkanen , Peter Isrealsson , Fabian Kirstein , Michal Kitta , Jakub Klimek , Rae Knowler , Fredrik Knutsson , Peter Kochman , Sirkku Kokkola , Michal Kuban , Michal Kuban , Kaia Kulla , Maria Lenartowicz , Anja Litka , Anja Loddenkemper , Hagar Lowenthal , Melanie Mageean , Agata Majchrowska , Hugh Mangan , Estelle Maudet , Balint Miklos , Esther Minguela , Joachim Nielandt , Geraldine Nolf , Erik Obsteiner , Javier Orozco , Csapo Orsolya , Matthias Palmer , Alberto Palomo , Francesco Paolicelli , Eirini Pappi , Mihai Paunescu , Sylwia Pichlak Pawlak , Jiri Pilar , Ludger Rinsche , Daniele Rizzi , Reet Roosalu , Ana Rosa , Maik Roth , Antonio Rotundo , Michal Ruzicka , Jill Saligoe-Simmel , Fabian Santi , Giovanna Scaglione , Giampaolo Selitto , Martin Semberger , Paulo Seromenho , Jan Skornsek , Michele Spichtig , Emidio Stani , Kjersti Steien , Simon Steuer , Terje Sylvarnes , Martin Traunmuller , Kees Trautwein , Stavros Tsouderos , Thomas Tursics , Bert Van Nuffelen , Uwe Voges , Gabriella Wiersma , Jesper Zedlitz , Mantas Zimnickas .