LDES Server Primer

Living Document,

This version:
https://w3id.org/ldes/server-primer
Issue Tracking:
GitHub
Inline In Spec
Editor:
Pieter Colpaert

Abstract

This Server Primer for Linked Data Event Streams (LDES) provides practical guidance for data publishers on implementing and hosting an LDES server. LDES aims to help publishers balance between offering rich querying APIs and simple data dumps by proposing an event stream as the base API. This primer focuses on lightweight, scalable approaches and best practices for setting up and maintaining an LDES server.

1. Introduction

This server primer is a living document of derived normative rules based on the main consumer-oriented LDES specification and the W3C TREE hypermedia specification.

A Linked Data Event Stream (LDES) is an append-only log consisting of immutable members. The term “member” could also be interpreted as “event”, “activity”, “observation”, “record”, or “immutable entity”. For example, “an observation states that at this timestamp a specific sensor observed 5°C”. However, since LDES extends the W3C TREE hypermedia specification, we use the term “member” for consistency.

An overview of the LDES specification.

2. Serializations and HTTP Responses

A server MUST provide data in either [n-quads], [n-triples], [trig], [turtle] or [json-ld]. It MAY also provide multiple serializations using content negotiation.

Note: When using content negotiation, set Vary: Accept.

It SHOULD provide an ETag header on responses. If the page is immutable, it SHOULD provide a Cache-Control: immutable header.

If [json-ld] is used, there is an example context at https://w3id.org/ldes/context. Do not reference this URL directly in production; copy it into your project. If you host an external context yourself, ensure robust caching with the ETag and/or Cache-Control max-age headers.

A provider SHOULD implement the TREE Profile specification for performance. In this case, you MUST order members chronologically in the page (i.e., append the members to the file as you go).

We will try to generalize this in the future so that we can also integrate with Jelly. This binary serialization has the potential to raise performance drastically.

If the server is overloaded, it MUST provide a 429 Too Many Requests. The client will then retry later.

3. Context Information

On the first page (root node), you MUST include context information about the LDES and this particular root node of the LDES. For features and how a client would interpret them, see the main spec.

Using tree:viewDescription on the root node, you MAY also link to an entity (embedded in the same page) that contains the retention policy, or other context data about this view of the LDES (e.g., the dcat:Distribution, the tree:SearchTree, or the ldes:EventSource) as a named entity. This is useful, for example, if a producer would like to disambiguate the IRI for the ldes:EventSource from the root tree:Node.

Recommended context properties on the ldes:EventStream:

4. Paginating Your Event Stream

Instead of a one- or two-dimensional pagination scheme, TREE/LDES lets you describe the relations you want and build the search tree you need. We recommend the following:

we should still add examples of how to paginate here.

Every tree:Node MAY contain zero or more members and MAY contain zero or more relations.

4.1. Entry Points and Discovery

Publish a stable entry point for clients. Expose either:

Avoid ambiguity by ensuring there is exactly one tree:view for the entry point. If you rotate the root node over time, keep R stable or use redirects.

Discovery is yet to be further explained. More input from existing implementations is appreciated through the issue tracker.

4.2. Members

Members MUST be linked from the event stream identifier using tree:member. For example:

@prefix ldes: <https://w3id.org/ldes#> .
@prefix tree: <https://w3id.org/tree#> .
@prefix ex: <http://example.org/> .

ex:eventstream a ldes:EventStream ;
    tree:view <> ;
    tree:member ex:member1, ex:member2 ;
    ldes:timestampPath ex:createdAt ;
    ldes:versionOfPath ex:versionOf .

The object of tree:member MUST be an IRI that identifies an immutable concept.

Note: To ensure immutability, the IRI should reference a resource that cannot change over time. A common approach is to include a timestamp, hash or version identifier in the IRI, so that each IRI corresponds to a specific, unalterable state or event.

If you add a member to multiple pages, this MUST be done atomically. This ensures that a client’s synchronization run is reliable: members emitted in the current run will not be newly encountered in future runs. This atomicity is a precondition for clients to safely forget parts of the log, as those members cannot be encountered again once the pages the members were encountered in become immutable.

If you reuse the member IRI as a named graph, clients MAY assume the payload of the upsert is in that named graph. Publish consistently so consumers can locate the triples for updates and deletions.

4.3. Transactions

If you want to flag that certain members must be processed together (e.g., a large deletion operation), you can model transactions:

Producers SHOULD ensure the member that finalizes the transaction has an equal or later ldes:timestampPath/ldes:sequencePath than preceding transaction members so ordered clients can emit it last.

5. Scaling

Next to optimizations such as using a binary format such as Jelly, or manually creating an aggregated summary LDES as a derived LDES, there are also two other tools one can use for scaling up.

5.1. Compacting your log with a retention policy

Retention policies enable servers to compact their logs while keeping client expectations clear. By declaring a retention policy on the root node (or via a tree:viewDescription entity linked from the root), producers communicate what portion of the event history is still available from this view. Clients will assume they cannot retrieve members outside the declared policy window.

Where to publish and cardinality

Supported policy properties

Computation and time base

Publishing changes and server behavior

Sliding full history for one year, plus version constraints
@prefix ldes: <https://w3id.org/ldes#> .
@prefix tree: <https://w3id.org/tree#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<> a ldes:EventSource ;
   ldes:retentionPolicy [
     ldes:fullLogDuration "P1Y"^^xsd:duration ;
     ldes:versionAmount 1 ;
     ldes:versionDeleteDuration "P1Y"^^xsd:duration
   ] .
Point-in-time start and version window
@prefix ldes: <https://w3id.org/ldes#> .
@prefix tree: <https://w3id.org/tree#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<> a ldes:EventSource ;
   ldes:retentionPolicy [
     ldes:startingFrom "2026-01-01T00:00:00Z"^^xsd:dateTime ;
     ldes:versionAmount 3 ;
     ldes:versionDuration "P90D"^^xsd:duration
   ] .

Notes

5.2. Rebalancing the search tree

Rebalancing a search tree of an LDES is interesting for old immutable pages as most clients are going to be interested in the full history anyway. Compression becomes much more efficient on bigger pages, and thus less data will need to be transferred over the wire, saving bandwidth.

Rebalancing is tricky though, because a client might get stuck in edge cases when it is just replicating the datasets while the rebalancing is happening, and also a server cache might still have a copy of all or some of your immutable pages.

As a running example, imagine a client is synchronizing a day-page 2022-05-02 but then all pages under 2022 are getting merged into one.

A server MUST, in that case, provide redirects to a new IRI, such as 2022-rebalanced, from all old pages, including the page 2022 to 2022-rebalanced.

Note: the semantics of ldes:immutable are that the members on this page and the relations should not be processed again. The page MAY still be rebalanced later on, or the page can become unavailable on disk (410 Gone).

6. Validating the pages

This section includes the rules to validate an implementation of a root node and any subsequent node.

we still need to build the SHACL shapes here.

we still need an UML image here.

6.1. For the Root Node

A root node MUST link the event stream to the view using the tree:view property.

A root node MUST contain context information about the LDES. All these properties in the domain of the event stream have a cardinality of 0 or 1:

A root node MUST contain context information about this particular entry point:

6.1.1. For the Retention policies

A root node MUST contain at most one ldes:retentionPolicy property (cardinality: 0..1). The value of ldes:retentionPolicy MUST be an IRI referring to a retention policy description.

A retention policy description MAY contain the following properties, each with cardinality 0 or 1:

6.2. Root Node and Subsequent Nodes

On the event stream, 0 or more tree:member triples are provided. The objects MUST be IRIs.

A tree:view triple MAY be present on the event stream to the current page <>.

A current page <> has 0 or more tree:relation properties to relations.

This page MAY also have ldes:immutable true attached to it. The default value is false. If it is not immutable, this SHOULD NOT be made explicit using a false value.

6.3. Relations

Relations in LDES are used to describe how pages or nodes are connected within the event stream. Each relation is represented using the tree:relation property and SHOULD specify its type and relevant properties.

On all relations, exactly one tree:node MUST be present. The object MUST be an IRI.

In case it is typed a tree:GreaterThanRelation, tree:LessThanRelation, tree:EqualToRelation, tree:LessThanOrEqualToRelation, or tree:GreaterThanOrEqualToRelation, each of these relations MUST specify exactly one tree:path (a [SHACL] path) and tree:value.

For chronological views, you SHOULD use the same tree:path as the ldes:timestampPath. For time windows, publish both lower- and upper-bound relations to the same node; clients combine relations to the same node using logical AND. Avoid orphan relations and overlapping intervals that cause ambiguous traversal.

If a relation type isn’t understood by clients (e.g., a geospatial relation), provide an ordering-compatible path elsewhere so ordered clients can still discover early members.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[JSON-LD]
Manu Sporny; Gregg Kellogg; Markus Lanthaler. JSON-LD 1.0. 3 November 2020. REC. URL: https://www.w3.org/TR/json-ld/
[N-QUADS]
Gavin Carothers. RDF 1.1 N-Quads. URL: https://w3c.github.io/rdf-n-quads/spec/
[N-TRIPLES]
Gavin Carothers; Andy Seaborne. RDF 1.1 N-Triples. URL: https://w3c.github.io/rdf-n-triples/spec/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[SHACL]
Holger Knublauch; Dimitris Kontokostas. Shapes Constraint Language (SHACL). URL: https://w3c.github.io/data-shapes/shacl/
[TRIG]
Gavin Carothers; Andy Seaborne. RDF 1.1 TriG. URL: https://w3c.github.io/rdf-trig/spec/
[TURTLE]
Eric Prud'hommeaux; Gavin Carothers. RDF 1.1 Turtle. URL: https://w3c.github.io/rdf-turtle/spec/

Issues Index

We will try to generalize this in the future so that we can also integrate with Jelly. This binary serialization has the potential to raise performance drastically.
we should still add examples of how to paginate here.
Discovery is yet to be further explained. More input from existing implementations is appreciated through the issue tracker.
we still need to build the SHACL shapes here.
we still need an UML image here.