Linked Data Event Streams

Living Standard,

This version:
https://w3id.org/ldes/specification
Issue Tracking:
GitHub
Inline In Spec
Editor:
Pieter Colpaert

Abstract

A Linked Data Event Stream (LDES) is an append-only collection of members described using the Resource Description Framework (RDF). The specification explains how a client can replicate the history of an event stream, and how it can then remain synchronized as new members are published.

1. Introduction

An LDES client is a piece of software used by a consumer that accepts the URL to an entry point, and returns a stream of members of the corresponding ldes:EventStream. The data stream emits the history that is available from this entry point, and once the consumer has caught up with the stream, it remains synchronized as new members are published.

The client does this by extending upon a subset of the W3C TREE hypermedia specification. The extension is that the Linked Data Event Streams specification introduces specialized terms for dealing with append-only collections, or event streams. For example, one can indicate what time-based property in the member is used for indicating the order of the event stream, indicate the retention policy as a promise from the producer to the consumer, or detail how to deal with version-based members.

More extensions should be specified w.r.t. HTTP status codes, or keeping state. This should be further detailled in a chapter after the overview.

2. Overview

An overview of the LDES specification.

A Linked Data Event Stream (LDES) (ldes:EventStream) is a collection (rdfs:subClassOf tree:Collection) of members that cannot be updated or removed once they are published, with each member being a set of RDF quads ([rdf-primer]). This way, the collection of members becomes an append-only log or event stream.

Following the TREE specification, this event stream is published using one or more HTTP resources. When more resources are used, these pages, or tree:Nodes will be structured according to a search tree. Therefore we will use the terms root node for the first page, and subsequent node for every next page in the structure.

In the root node, the client will expect these properties to be described on the ldes:EventStream entity:

In the root node, the current node identified by the URL of the page (a provider can achieve this simply by using a relative IRI <>) will be further described using these properties:

The client MUST implement the initialization of the TREE specification, although it is not required to extract the tree:search form.

In any tree:Node – root node or subsequent node – the client expects to find 0 or more members of the ldes:EventStream using the tree:member property. The subject is the event stream instance, and the object is the root focus node of a member. An LDES client MUST implement the Member Extraction Algorithm of the TREE specification to retrieve the full set of quads of the member. In an ldes:EventStream, the object of the tree:member triple can only be an IRI as this IRI will be used in the state to check whether the member has already been emitted or not.

An example record from a sensor observation dataset in the [turtle] format:
ex:Observations a ldes:EventStream ;
                ldes:timestampPath sosa:resultTime ;
                tree:shape ex:shape1.shacl ;
                tree:view <> ;
                tree:member ex:Observation1 .

ex:Observation1 a sosa:Observation ;
                sosa:resultTime "2026-01-01T00:00:00Z"^^xsd:dateTime ;
                sosa:hasSimpleResult "..." .
An example record from a base registry of addresses in the [trig] format:
ex:AddressRecords a ldes:EventStream ;
                  ldes:timestampPath dcterms:created ;
                  ldes:versionOfPath dcterms:isVersionOf ;
                  tree:shape ex:shape2.shacl ;
                  tree:view <> ;
                  tree:member ex:AddressRecord1-activity1 .

ex:AddressRecord1-activity1 dcterms:created "2026-01-01T00:00:00Z"^^xsd:dateTime ;
                           adms:versionNotes "First version of this address" ;
                           dcterms:isVersionOf ex:AddressRecord1 .
                           
 ex:AddressRecord1-activity1 {
    ex:AddressRecord1 dcterms:title "Streetname X, ZIP Municipality, Country" .
}

In any tree:Node, root node or subsequent node, the client expects to find zero or more tree:relation properties, containing a description of the tree:Relations from this node to subsequent nodes. A client MUST traverse the relations cfr. the TREE chapters on traversing the search tree. A client MUST keep its own state to know when to refetch certain tree:Nodes.

We should refer here to a new next chapter on how to gracefully iterate over the pages and how to keep the state in more detail cfr. the extensions in Issue 1. We can then also indicate that a client MAY implement the text on pruning branches related to interpreting comparators for xsd:dateTime literals if it wants to detect immutable pages via the timestampPath.

More specific server documentation should be found in a Server Primer (to do), such as containing a link to the JSON-LD context, official SHACL shapes for LDES to validate your pages, best practices for publishing an LDES for reaching an optimal performance, best practices for enveloping your data using named graphs, how to build a status log for the use case of an aggregator or harvester, etc.

3. Retention policies

The goal of the retention policies is to indicate that, a client should not rely on finding members that fall outside the retention policies. This can help a consumer in the discovery phase to pick the right LDES, or help the consumer to detect non-viable synchronization set-ups.

An overview of the existing retention policies in LDES

When no retention policy is provided in the root node, the client MUST assume all members that have been added to the ldes:EventStream are still available from this root node. When one or more retention policies are provided, a client MUST assume it will not be able to find members outside of the retention policy, or outside of the union of the retention policies when multiple are provided.

In the LDES specification, four types of retention policies are defined which can be used with a ldes:retentionPolicy:

  1. ldes:DurationAgoPolicy: a time-based retention policy in which data generated before a specified duration is removed

  2. ldes:LatestVersionSubset: a version subset based on the latest versions of an entity in the stream

  3. ldes:PointInTimePolicy: a point-in-time retention policy in which data generated before a specific time is removed

A consumer MUST interpret the retention policies to understand whether the goal of the client can be reached. For example, an LDES client can work with a configured polling interval, or can be executed on a schedule. This polling interval, or interval between executions, cannot be longer than the retention policy of the data in the LDES. Or, when an LDES client is asked for a full replication, it MUST check the retention policy and whether this is possible.

3.1. Time-based retention policies

A time-based retention policy can be introduced as follows:

ex:C3 a ldes:EventStream ;
      ldes:timestampPath prov:generatedAtTime ;
      tree:view <> .

<> ldes:retentionPolicy ex:P1 .

ex:P1 a ldes:DurationAgoPolicy ;
      tree:value "P1Y"^^xsd:duration . # Keep 1 year of data

A ldes:DurationAgoPolicy uses a tree:value with an xsd:duration-typed literal to indicate how long ago the timestamp. When the ldes:timestampPath is redefined on the policy itself, a client MUST interpret the retention policy on this path instead.

3.2. Version-based retention policies

In order to indicate you only keep 2 versions of an object referred to using dcterms:isVersionOf:

ex:C2 a ldes:EventStream ;
      ldes:timestampPath dcterms:created ;
      ldes:versionOfPath dcterms:isVersionOf ;
      tree:view <> .

<> ldes:retentionPolicy ex:P2 .

ex:P2 a ldes:LatestVersionSubset;
      ldes:amount 2 ;
      #If different from the Event Stream, this can optionally be overwritten here
      ldes:timestampPath dcterms:created ;
      ldes:versionOfPath dcterms:isVersionOf .

A ldes:LatestVersionSubset has the property ldes:amount and MAY redefine the ldes:timestampPath and/or ldes:versionOfPath. It MAY also define a compound version key using ldes:versionKey (see example below) instead of ldes:versionOfPath. The ldes:amount has a xsd:integer datatype and indicated how many to keep that defaults to 1. The ldes:versionKey is an rdf:List of SHACL property paths indicating objects that MUST be concatenated together to find the key on which versions are matched. When the ldes:versionKey is set to an empty path (), all members MUST be seen as a version of the same thing.

For sensor datasets the version key may get more complex, grouping observations by both the observed property as the sensor that made the observation.
ex:C1 a ldes:EventStream ;
      tree:view <> .

<> ldes:retentionPolicy ex:P3 .

ex:P3 a ldes:LatestVersionSubset;
      ldes:amount 2 ; 
      ldes:versionKey ( ( sosa:observedProperty ) ( sosa:madeBySensor ) ) .

3.3. Point-in-time retention policies

A point-in-time retention policy can be introduced as follows:

ex:C4 a ldes:EventStream ;
      ldes:timestampPath prov:generatedAtTime ;
      tree:view <> .

<> ldes:retentionPolicy ex:P4 .

ex:P4 a ldes:PointInTimePolicy ;
      ldes:pointInTime "2026-04-12T00:00:00Z"^^xsd:dateTime . # Keep data after April 12th, 2026 according to the default timezone

A ldes:PointInTimePolicy uses a ldes:pointInTime with an xsd:dateTime-typed literal to indicate the point in time on or after which data is kept when compared to a member’s timestamp, indicated by the ldes:timestampPath that MAY be redefined in the policy itself.

4. Vocabulary

Next to re-using terms from the tree: vocabulary, the ldes: namespace introduced in this document provides a couple of new terms. The base IRI for LDES is https://w3id.org/ldes#, and the preferred prefix is ldes:. There is a Turtle version available at https://w3id.org/ldes#Vocabulary

4.1. ldes:EventStream

The class ldes:EventStream is a subclass of tree:Collection. The specialization being that all members are immutable, and thus that this tree:Collection is append-only.

4.2. ldes:timestampPath

The path to the xsd:dateTime literal in each member that defines the order of the event stream.

Domain: ldes:EventStream or ldes:RetentionPolicy

Range: a SHACL property path

4.3. ldes:versionOfPath

The path to the IRI in each member that defines the entity of which this member is a version.

Domain: ldes:EventStream or ldes:RetentionPolicy

Range: a SHACL property path

4.4. ldes:EventSource

The class ldes:EventSource is a subclass of dcat:Distribution, the specialization being that this is a feed that uses a chronological search tree to make available a Linked Data Event Stream in order.

An ldes:EventSource can only be published on LDESs that have a ldes:timestampPath set, and thus will publish their entities in order.

4.5. ldes:retentionPolicy

Links to one or more retention policies.

Domain: Preferably the root node. Alternatively it can occur on any type of entity that is linked from the root node using tree:viewDescription.

Range: ldes:RetentionPolicy

4.6. ldes:RetentionPolicy

An abstract class for a retention policy.

4.7. ldes:DurationAgoPolicy

A retention policy that uses an xsd:duration literal to document a sliding window of data.

4.8. ldes:LatestVersionSubset

A retention policy that select an amount of versions based on the versionOfPath, or a more advanced version key that can be configured.

4.8.1. ldes:versionKey

An advanced property for setting a compound key for defining the latest version of an entity.

Domain: ldes:LatestVersionSubset

Range: an rdf:List, with each item of the list a SHACL property path.

4.8.2. ldes:amount

The number of versions to keep. This MUST be a number greater than 0.

Domain: ldes:LatestVersionSubset

Range: xsd:integer

4.9. ldes:PointInTimePolicy

A retention policy that indicates members are kept starting on a certain point in time.

4.9.1. ldes:pointInTime

The point in time from which members will be available starting from this root node.

Domain: ldes:PointInTimePolicy

Range: xsd:dateTime including an explicit timezone

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[RDF-PRIMER]
Frank Manola; Eric Miller. RDF Primer. 10 February 2004. REC. URL: https://www.w3.org/TR/rdf-primer/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[SHACL]
Holger Knublauch; Dimitris Kontokostas. Shapes Constraint Language (SHACL). URL: https://w3c.github.io/data-shapes/shacl/
[TRIG]
Gavin Carothers; Andy Seaborne. RDF 1.1 TriG. 25 February 2014. REC. URL: https://www.w3.org/TR/trig/
[TURTLE]
Eric Prud'hommeaux; Gavin Carothers. RDF 1.1 Turtle. 25 February 2014. REC. URL: https://www.w3.org/TR/turtle/

Issues Index

More extensions should be specified w.r.t. HTTP status codes, or keeping state. This should be further detailled in a chapter after the overview.
We should refer here to a new next chapter on how to gracefully iterate over the pages and how to keep the state in more detail cfr. the extensions in Issue 1. We can then also indicate that a client MAY implement the text on pruning branches related to interpreting comparators for xsd:dateTime literals if it wants to detect immutable pages via the timestampPath.
More specific server documentation should be found in a Server Primer (to do), such as containing a link to the JSON-LD context, official SHACL shapes for LDES to validate your pages, best practices for publishing an LDES for reaching an optimal performance, best practices for enveloping your data using named graphs, how to build a status log for the use case of an aggregator or harvester, etc.