Implementation report: building cultural heritage event streams with LDES

Living Standard,

This version:
https://semiceu.github.io/LDES-implementation-reports/cultural-heritage-LDES/index.html
Issue Tracking:
GitHub
Editors:
- Nuno Freire
- Arne Stabenau
- Antoine Isaac
- Enno Meijers
- Pano Maria
- Bob Coret
- Pieter Colpaert

Abstract

Publishing a full data dump repetitively will delegate change detection – a fault-prone process – to data consumers. With cultural heritage event streams we propose that cultural heritage data providers publish an event source API that can help to replicate their source datasets about cultural heritage entities towards a harvester, and keep the data on the harvester side in-sync in the way that is intended by the provider. This specification describes how to publish entity changes using the Activity Streams vocabulary and LDES, and how harvesters may process them.

1. Publishing changes about cultural heritage entities

This section introduces how to represent a cultural heritage event stream (hereafter referred to as CH stream) as a Linked Data Event Stream (LDES) [LDES] with Activity Streams entities [activitystreams-vocabulary].

CH streams focus on the representation of changes about the representation of cultural heritage entities in a data publication system, enabling consuming systems to maintain their replica of the source dataset consistently and efficiently. Using the same pattern as the IIIF Change Discovery API, the changes in a cultural heritage dataset are expected to be represented specifically as instances of Activity Stream activities (create, update and delete) that are performed on entities that are maintained in catalogues of cultural heritage institutions, or in the vocabulary management systems employed by them.

Note: The primary use case addressed by CH streams is the sharing of datasets that are originally designed as sets of records about cultural heritage entities (for example artworks, and terms from a controlled vocabulary used in structured descriptions).These datasets are typically not RDF originally, but cultural heritage institutions publish them as RDF to promote interoperability and data reuse. When expressed in RDF, the resources in these datasets usually still can be grouped by the objects-specific records that underlie them. These entity-specific groupings of statements are what we expect to represent in LDES within the payload of Activity Streams activities.

1.1. Used prefixes

The following prefixes are used throughout this document:

as https://www.w3.org/ns/activitystreams#
dct http://purl.org/dc/terms/
ldes https://w3id.org/ldes#
tree https://w3id.org/tree#
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#

1.2. General description of cultural heritage event streams

CH streams are append-only event streams represented in a RDF resource typed ldes:EventStream that MUST be given an IRI. The ldes:EventStream SHOULD have the ldes:timestampPath property unless the timestamps of the resources in the stream (activities, cf. Section § 1.3 Activities) cannot be provided. It SHOULD be set to as:published, but it MAY be set to another property. The ldes:versionOfPath MUST be set to as:object, which configures the property that will be used in as:Activity to point to the entity that is the direct object of the activity being altered.

CH streams rely on RDF Named Graphs. Therefore, a CH stream MUST be published using either application/ld+json or application/trig as MIME type, and the Content-Type header MUST be set accordingly. Through content negotiation, additional formats that support RDF Named Graphs MAY also be offered.

A cultural heritage ldes:EventStream
{
    "@context" : {
      "ldes": "https://w3id.org/ldes#",
      "tree": "https://w3id.org/tree#",
      "as": "https://www.w3.org/ns/activitystreams#",
      "dct": "http://purl.org/dc/terms/",
    },
    "@id": "#stream",
    "@type": "ldes:EventStream",
    "dct:title": "My cultural heritage event stream",
    "ldes:timestampPath": "as:published",
    "ldes:versionOfPath": "as:object",
}

1.3. Activities

CH streams use the [activitystreams-vocabulary] to represent the changes. Three types of activities can be described:

These activities MUST use the property as:object with the IRI of a cultural heritage entity, which is the subject of a number of statements; it thus cannot be a blank node. They SHOULD use a as:published property with an xsd:dateTime datatype, and SHOULD provide a rdf:type. The activity MUST be identified using an IRI. The payload that corresponds to the cultural heritage entity (i.e. the statements about it, cf. this section’s introduction) MUST be provided in the named graph with the activity IRI as the graph.

NOTE:
When one of the following optional properties is not available, the consumer may process the activities as follows:

After publication in the CH stream, an activity must remain immutable. However, it may be deleted in accordance with the retention policy of the CH stream (see Section § 1.5 Retention policies). The publisher must ensure that no activity is added to the CH stream with a timestamp earlier than the most recent one already published.

A CH stream containing one activity
{
    "@context" : {
      "ldes": "https://w3id.org/ldes#",
      "tree": "https://w3id.org/tree#",
      "as": "https://www.w3.org/ns/activitystreams#",
      "dct": "http://purl.org/dc/terms/",
      "foaf": "http://xmlns.com/foaf/0.1/",
    },
    "@id": "#myStream",
    "@type": "ldes:EventStream",
    "dct:title": "My cultural heritage event stream",
    "ldes:timestampPath": "as:published",
    "ldes:versionOfPath": "as:object",
    "tree:member": [
        {
            "@id": "https://example.org/object1#event1",
            "@type": "as:Create",
            "as:object": "https://example.org/object1",
            "as:published" : "2023-10-01T12:00:00Z",
            "@graph": {
                "@id": "https://example.org/object1",
                "@type": "ex:Book",
                "dct:title": "An example book",
                "dct:creator": {
                  "@id": "https://example.org/object1#event1",
                  "@type": "foaf:Agent",
                  "foaf:name": "John Smith",
                }
            }
        }
    ]}

1.4. The members’ SHACL shape

Publishers of CH streams have the option to provide a SHACL shape file that communicates an intention of the data provider to respect the shape for every member in the CH stream.

The ldes:EventStream MAY have the tree:shape property with a value pointing to a sh:NodeShape or to an RDF file containing a set of shapes, which can be used to validate the members in the event stream. If such a shape is provided, a CH stream provider SHOULD test the members before adding them to the stream.

Note: The members of CH streams are Activity Streams activities that include a named graph containing data related to a cultural heritage entity. Since LDES specifies that shapes validate the members of an LDES stream, the shapes defined in CH streams should therefore validate both the activities and the data contained in the named graph of the cultural heritage entity.

Note: Including the members’ SHACL shapes is optional since, currently, SHACL shapes are not widely available for most of the RDF data models used in cultural heritage.

1.5. Retention policies

The goal of a retention policy is to indicate in what way a specific view will not be able to provide a complete history of the event stream to the consumer - because not all activities can be published forever. In CH streams, consumers are generally not interested in the full history, therefore, it is recommended that publishers keep in the stream only the last version of each member (this is functionally equivalent to the legacy ldes:LatestVersionSubset policy, which remains in use in certain CH streams and until this ceases to happen MUST be supported by harvesters).

Note: The recommendation to follow such retention policy in CH streams is based on the fact that the algorithms defined by LDES for traversing a stream’s tree structure can become highly inefficient when applied to CH streams representing large cultural heritage datasets that are updated daily, if the full event history is retained in the stream.

This retention policy SHOULD be stated in all tree:Node instances of the stream, with the property ldes:retentionPolicy. Its value SHOULD be a resource that has the property ldes:versionAmount with the value 1.

Note: It may also happen that the source does not keep track of the deleted entities. In this case, the publisher will not be able to provide the delete activities. While this behaviour is not recommended, we will nonetheless propose to add into the LDES specification a new “implicit remove” retention policy, which would make it clear that any previously found activity not in the latest version shall be considered to correspond to a removed object.

Note: Even if deletions are tracked in the source, having to keep delete activities indefinitely will be difficult after a long period of time. Therefore, a third retention policy should be offered in order to express that deletions are not kept in the stream after a certain period of time. This is also not supported at this time in the LDES specification, but we will propose it as an addition.

The configuration of the retention policy of a CH stream
{
    "@context" : {
      "ldes": "https://w3id.org/ldes#",
      "tree": "https://w3id.org/tree#",
      "as": "https://www.w3.org/ns/activitystreams#",
      "dct": "http://purl.org/dc/terms/",
    },
    "@id": "#myStream",
    "@type": "ldes:EventStream","view": {
      "@id": "",
      "@type": "ldes:EventSource",
      "ldes:retentionPolicy": {
        "ldes:versionAmount": 1
      }}

1.6. Versioning and transactions

CH streams must also define how the LDES versioning mechanism should be used by consumers. Since CH streams use Activity Stream activities as its members, on the ldes:EventStream entity, the following properties should be used (consult the LDES vocabulary [LDES-VOCABULARY] for an explanation of the properties):

The minimum configuration of the versioning properties for CH streams
{
    "@context" : {
      "ldes": "https://w3id.org/ldes#",
      "tree": "https://w3id.org/tree#",
      "as": "https://www.w3.org/ns/activitystreams#",
      "dct": "http://purl.org/dc/terms/",
    },
    "@id": "#stream",
    "@type": "ldes:EventStream",
    "ldes:timestampPath": "as:published",
    "ldes:versionOfPath": "as:object",
    "ldes:versionCreateObject": "as:Create",
    "ldes:versionUpdateObject": "as:Update",
    "ldes:versionDeletePath": "as:Delete",}

For typical cultural heritage applications of LDES, transactions are generally unnecessary. Nonetheless, publishers MAY choose to apply transactions on their streams in particular cases, as defined in the LDES specification [LDES].

1.7. Traversing the search tree

An CH stream MUST follow the chronological search tree from the Server Primer [LDES-SERVER-PRIMER]. Consequently, LDES clients SHOULD traverse a CH stream using the chronological ascending order mode specified in the LDES specification [LDES].

2. Implementation report

2.1. Publisher implementations

2.2. Consumer implementations

Europeana can harvest CH streams that use the Europeana Data Model (EDM) in the payload of the cultural heritage entity.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[ACTIVITYSTREAMS-VOCABULARY]
James Snell; Evan Prodromou. Activity Vocabulary. URL: https://w3c.github.io/activitystreams/vocabulary/
[LDES]
Pieter Colpaert. Linked Data Event Streams. 2025-10-07. URL: https://semiceu.github.io/LinkedDataEventStreams/
[LDES-VOCABULARY]
Pieter Colpaert. Linked Data Event Streams Vocabulary. 2025-10-07. URL: https://semiceu.github.io/LinkedDataEventStreams/vocabulary.html
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Informative References

[LDES-SERVER-PRIMER]
Pieter Colpaert. LDES Server Primer. 2025-10-07. URL: https://semiceu.github.io/LinkedDataEventStreams/server-primer