Implementation report: publishing a dcat-ap-feed for the EU Agency for Railways

Living Standard,

This version:
https://semiceu.github.io/LDES-implementation-reports/era-dcat-ap-feed/index.html
Issue Tracking:
GitHub
Editors:
- Julian Rojas Melendez
- Ghislain Atemezing

Abstract

This implementation report describes how the EU Agency for Railways (ERA) publishes data resource updates following the DCAT-AP-feeds specification which relies on LDES and the Activity Streams v2.0 vocabulary. The feed runs and is published by relying solely on a Gitlab CI/CD pipeline, which stores the static files of the feed in a public Gitlab repository.

1. Publishing changes about the EU Agency for Railways data resources as an DCAT-AP-Feed

This section introduces how to represent the metadata feed for the EU Agency for Railways (ERA) as a DCAT-AP-Feed [DCAT-AP-FEEDS] which is a specific type of Linked Data Event Stream (LDES) [LDES] using Activity Streams entities [activitystreams-vocabulary].

The ERA DCAT-AP feed focuses on the representation of changes about the DCAT-AP metadata of the EU Railway Agency, enabling consuming systems to maintain their replica of the source datasets consistently and efficiently. The feed is a proof-of-concept RDF-Connect pipeline that produces a DCAT-AP-Feed [DCAT-AP-FEEDS] from ERA’s DCAT-AP metadata, which is originally available as a queryable named graph at http://data.europa.eu/949/graph/uat/dcat.

The pipeline is executed periodically (every 18 hours) to:

  1. Fetch the DCAT-AP data from ERA’s Virtuoso endpoint.

  2. Detect any changes in the described assets by comparing with the previous state.

  3. Write detected changes to the LDES feed as a collection of static documents.

1.1. Used prefixes

The following prefixes are used throughout this document:

as: https://www.w3.org/ns/activitystreams#
dct: http://purl.org/dc/terms/
dcat: http://www.w3.org/ns/dcat#
ldes: https://w3id.org/ldes#
tree: https://w3id.org/tree#
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#

1.2. General description of the ERA DCAT-AP feed

The ERA DCAT-AP feed is an append-only event stream represented as an RDF resource typed ldes:EventStream that is given an IRI. The ldes:EventStream has the ldes:timestampPath property set to as:published, which configures the property used in as:Activity to point to the entity that is the direct object of the activity being altered.

The feed relies on RDF Named Graphs. The static documents are served using GitLab Pages and are published using application/trig as the MIME type.

The ERA ldes:EventStream definition
<http://data.europa.eu/dy6/def/as/ERAStream> a ldes:EventStream, dcat:Dataset;
    ldes:timestampPath as:published;
    dcat:publisher <http://publications.europa.eu/resource/authority/corporate-body/ERA>;
    dcat:title "DCAT-AP Feed of ERA"@en, "Flujo DCAT-AP de ERA"@es, "Flux DCAT-AP de l'ERA"@fr;
    tree:view <https://era-europa-eu.gitlab.io/public/interoperable-data-programme/era-ontology/era-dcat-ap-feed/http_3A_2F_2Fdata.europa.eu_2Fdy6_2Fdef_2Fas_2FERAStream/index.trig>.

1.3. Activities

The pipeline generates ActivityStreams events (Create, Update, Delete) for changed resources. These activities use the property as:object with the IRI of the DCAT-AP entity. They also use the as:published property with an xsd:dateTime datatype. The activity is identified using an IRI, and the payload that corresponds to the DCAT-AP entity is provided within a named graph matching the activity IRI.

An activity in the ERA DCAT-AP feed in TriG
<#eb0078d9f7818be07f657123a947589c> a <https://www.w3.org/ns/activitystreams#Create>;
    <https://www.w3.org/ns/activitystreams#object> <http://data.europa.eu/949/id/distribution/ERAdump_v303>;
    <https://www.w3.org/ns/activitystreams#published> "2025-11-19T05:07:28.244Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>.

<#eb0078d9f7818be07f657123a947589c> {
    <http://data.europa.eu/949/id/distribution/ERAdump_v303> a <http://www.w3.org/ns/dcat#Distribution>;
        <http://purl.org/dc/terms/issued> "2025-07-24"^^<http://www.w3.org/2001/XMLSchema#date>;
        <http://purl.org/dc/terms/title> "ERA KG dump v3.0.3"@en;
        <http://purl.org/dc/terms/language> <http://publications.europa.eu/resource/authority/language/ENG>;
        <http://www.w3.org/ns/dcat#accessURL> <https://doi.org/10.5281/zenodo.15691818>;
        <http://www.w3.org/ns/dcat#downloadURL> <https://zenodo.org/records/16413403/files/2025-07-24-rinf-xml-combined.nq>;
        <http://www.w3.org/ns/dcat#mediaType> <http://www.iana.org/assignments/media-types/application/n-quads>;
        <http://purl.org/dc/terms/license> <http://data.europa.eu/eli/dec_impl/2017/863/oj> .
}

1.4. Bucketization and state tracking

Changes are tracked by comparing the fetched SPARQL query results with the previous state stored in a LevelDB instance. The activities are then bucketized using time-based fragmentation (monthly buckets). This allows the static files to scale predictably over time.

1.5. Versioning

The feed uses the LDES versioning mechanism by providing the ldes:timestampPath set to as:published. It also adheres to Activity Stream requirements.

1.6. Traversing the search tree

The feed MUST follow the chronological search tree from the Server Primer [LDES-SERVER-PRIMER]. Consequently, LDES clients SHOULD traverse the feed using the chronological ascending order mode specified in the LDES specification [LDES].

2. Implementation report

2.1. Architecture

The pipeline is implemented using RDF-Connect processors. It runs as a GitLab CI/CD pipeline, taking the data from the source endpoint to the static pages output.

[ERA Virtuoso SPARQL Endpoint]
         ↓
   [HttpFetch Processor]
         ↓
  [DumpsToFeed Processor] → [LevelDB State]
         ↓
    [Sdsify Processor]
         ↓
   [Bucketize Processor] → [Feed State JSON]
         ↓
 [LdesDiskWriter Processor]
         ↓
     [docs/ directory] → [GitLab Pages]

2.2. Setup

2.2.1. Prerequisites

2.2.2. Local development

cd pipeline
npm install
npx @rdfc/js-runner rdfc-pipeline.ttl

2.3. Publisher implementations

2.4. Additional Resources

2.5. Support

For issues specific to:

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[ACTIVITYSTREAMS-VOCABULARY]
James Snell; Evan Prodromou. Activity Vocabulary. URL: https://w3c.github.io/activitystreams/vocabulary/
[DCAT-AP-FEEDS]
SEMIC. DCAT-AP Feeds Specification. URL: https://semiceu.github.io/LDES-DCAT-AP-feeds/
[LDES]
Pieter Colpaert. Linked Data Event Streams. 2025-10-07. URL: https://semiceu.github.io/LinkedDataEventStreams/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Non-Normative References

[LDES-SERVER-PRIMER]
Pieter Colpaert. LDES Server Primer. 2025-10-07. URL: https://semiceu.github.io/LinkedDataEventStreams/server-primer