​Toolchain for Publishing Data Specifications - User Manual by SEMIC

Release:
1.0.2
Published at:
29/08/2024
Feedback at:
https://github.com/SEMICeu/toolchain-manual/issues

Background

In the years of developing the Core Vocabularies, SEMIC has matured a lot of experience in managing the lifecycle of data specifications and their publications. After facing recurring update cycles, managing the propagation of the updates through the specifications dependencies, and orchestrating the input of multiple teams of experts into the final versions, it became clear that a systematic support would better suit the long term maintenance of the data specifications, as opposed to solely manual effort.

SEMIC data specifications are published as HTML pages to be consulted by the experts when building their data models. More than that, SEMIC has injected the publication workflow with the ability to coherently generate models with explicit semantics.

For the reason that every manifestation, HTML or semantic models, is sequentially processed from components that operate atomic transformation, SEMIC names this support as “Toolchain”.

The expected benefits of applying a systematic support to data specifications lifecycle are:

  1. Harmonised and coherent experience of the browsing data specifications, which in turn seeks to increase the adoption
  2. Promotion of SEMIC data modelling best practices, embedding them in the publication workflows
  3. Support to scaling up of the editorial capacity through automation.

Introduction

This manual describes the tooling that is supporting the editorial workflow for managing data specifications. It provides a hands-on guide on how to start reusing the SEMIC toolchain and how editors can use it to generate data specification artefacts.

The target audience for this manual are:

  1. Editors operating or extending an existing SEMIC data specification;
  2. Any user who wishes to customise the SEMIC publication process to create new data specifications.

Roles, Tasks, Use Cases, Repositories and Tools

The publication process of a data specification involves 2 roles:

During the publication process, multiple tasks are performed covering different tasks executed over one or more use cases:

Task Use cases
Extend an existing data model UC1
Updating the UML data model UC2
Managing Persistent URIs UC3
Editing HTML specifications UC4 , UC5 , UC6
Deploy new software releases UC7
Customise the publication process UC8 , UC9 , UC10, UC11

The Toolchain presented in this manual relies on a set of Github repositories, and are presented in the below table:

Repository Description
SEMIC thema This repository mainly contains:
  • EAP files, to be opened by Enterprise Architect to change the data models
  • The template folder, including templates, per language, to change the specific layout of HTML specification
  • Site-skeleton folder, including the screenshot of each data model and the logo, to be include in the HTML specification
  • The config folder, including the JSON configuration file per data model to change various parameters for the publication process
SEMIC publication This repository mainly contains:
  • The template folder, including generic template that can be reused and customised by the template in the SEMIC thema repository
  • The config folder, including the main JSON publication file under the config/dev folder
  • .circleci folder, including the configuration file of the CircleCI pipeline
SEMIC generated This repository mainly contains:
  • The report folder which contains the logs of the execution of the CircleCI pipeline
  • The doc folder, which contains the artefacts generated for each specification including the HTML specification, JSON-LD context, SHACL shapes and XSD
SEMIC puri This repository mainly contains:
  • The release folder which contains, for each namespace, the RDF associated the respective URI
SEMIC proxy This repository mainly contains:
  • The configurations for the PURI service, in particular the htmlmap.lua which perform the HTML redirection

In below table the reader can find a summary of the repositories used, the roles involved and the tools needed per use case:

Repositories UC1 UC2 UC3 UC4 UC5 UC6 UC7 UC8 UC9 UC10 UC11
SEMIC thema X X X X X X
SEMIC publication X X X X X X X X X X
SEMIC generated X X X X X X X X X
SEMIC puri X
SEMIC proxy X
Roles
Editor X X X X X X X
Toolchain developer X X X X X X
Tools
Git client X X X X X X X X X X X
Text editor X X X X X X X X X X X
Web browser X X X X X X X X X
Enterprise Architect X
HTTP client X
SSH client X
Public / Private key gen X
Linux Terminal X
DockerHub X
CircleCI X X X X

As can be seen, most of the time the editor will use mainly the SEMIC thema, publication and generated repository for its operations. The editor and toolchain developer collaborate on UC3 to create and enable persistent URI’s and in UC9 to create a new SEMIC thema repository.

Tasks Execution

In order to execute the task described in this section, the reader is referred to the existing SEMIC data specification Core Person. In this section, the first tasks in the editorial and publication process are described using mainly Core Person while the last tasks, covered by UC7, UC8 and UC9, focus on customising the publication process.

Task: Extend an existing data model

UC1: Create a new Core Person

Objective

Setup a new custom data specification from an existing one in order to add new properties (UC2), create its own URI (UC3), edit metadata (UC4), update sections (UC5) and change the style of the specification (UC6). As an example in this use case, the objective is to create a copy of the existing Core Person and reuse as much as possible the current configuration.

Roles involved

Prior Knowledge

Repositories

Tools

Steps

  1. Pull the latest code from the SEMIC thema, publication and generated repository

  2. In the SEMIC publication repository modify the publication.json file inside the config/dev folder by adding at the end the following section (just before the “]” character):

    ,{ "dummy": "1", "urlref": "/doc/core-vocabulary/core-person-test", "repository": "git@uri.semic.eu-thema:SEMICeu/uri.semic.eu-thema.git", "branchtag": "main", "name": "core-person-ap", "filename": "config/core-person-test.json", "navigation": {} }

    Notice the “,” character before to concatenate with the previous section and save the file.

  3. In step 2 we indicated that the filename of the JSON configuration of the new Core Person should be in the config folder of the SEMIC thema repository called core-person-test.json, so duplicate core-person-2.json and rename it core-person-test.json

  4. Edit the core-person-test.json and modify just the “eap” property to

    "eap": "CorePerson-test.EAP",

    Be careful to keep a “,” at the end to concatenate with the next section and save the file.

  5. Now let’s duplicate CorePerson2.EAP by creating a copy named CorePerson-test.EAP

  6. Commit and push in the SEMIC thema repository and after in the SEMIC publication repository.

Test the result

Once the publication process ended, pull the latest code from the SEMIC generated repository and verify under the doc/core-vocabulary folder, that there is a core-person-test folder (as indicated in the publication.json file) including:

Task: Updating the UML data model

UC2: Adding a new property in an existing class

Objective

In the Core Person Test, there is a need to add “baptismal name” property within the Person class. It has been decided to:

Roles involved

Prior Knowledge

Repositories

Tools

Steps

  1. Pull the latest code from the SEMIC thema repository

  2. Open the CorePerson-test.EAP file with Enterprise Architect

  3. In “Project Browser” ribbon, expand the Core Person package, double click on the diagram called “Core_Person_publication”, select the class Person

  4. In the “Feature & Properties” ribbon add new attribute with Name “baptsimalName”

  5. Next define the Type “Text” to be selected from the drop-down menu “Select Type…” under “Core Vocabulary” package

  6. Select the Scope “Public”

    alt_text

  7. In the “Attribute Properties” ribbon, look for the Multiplicity property and select the default value “[1]”, double click on it and select Lower bound to 0 and Upper bound *, click OK

    alt_text

  8. In the “Tagged Values” ribbon, select the now empty “Attribute (baptsimalName)” section, click on “Add new tagged value” icon, in Tag type “label-en” with Value “baptismal name”, click OK

    alt_text

  9. Click again on “Add new tagged value” icon, in Tag type “definition-en” with Value “the name given by Christians”, click OK

  10. Click again on “Add new tagged value” icon, in Tag type “uri” with Value “http://data.europa.eu/m8g/baptismalName”, click OK

  11. As the UML diagram changed but we reuse the same folder of the original Core Person, we change the core-person-test.json file and modify just the “site” property to

    "site": "site-skeleton/core-person-test",

    and save the file.

  12. Under “site-skeleton” folder duplicate the core-person folder and rename it “core-person-test

  13. From Enterprise Architect, save the picture of the UML diagram from the menu “Publish -> Image -> Save to File” and save it under the “site-skeleton/core-person-test” folder with the name “overview.jpg”, overwriting the current one.

  14. Commit the 4 files changed (CorePerson-test.EAP, core-person-test.json and overview.jpg and semic-icon.png) and push to them to the SEMIC thema repository

  15. Update the publication.json file in the SEMIC publication repository by changing the dummy value to

    "dummy": "2",

    and commit and push to the SEMIC publication repository

Test the result

Once the publication process ended, pull the latest code from the SEMIC generated repository and verify under the doc/core-vocabulary folder, that there is a core-person-test folder including:

alt_text

Task: Managing Persistent URIs

UC3: Create a persistent URI for a new property

Objective

Having created the new property “baptismal name” in use case UC2, there is a need to create a new persistent URI related to the property.

When a specification is officially released, the persistent URI is maintained as RDF redirection and HTML redirection, both will be setup in this use case.

Roles involved

Prior Knowledge

Tools

Steps

  1. To setup the RDF redirection, pull the latest code from the SEMIC puri repository

    1. Under releases/m8g folder, create 3 empty files: baptsimalName.nt, baptsimalName.rdf and baptsimalName.ttl

    2. Open with a text editor the baptsimalName.ttl and type the following:

      @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://data.europa.eu/m8g/baptsimalName> a rdf:Property, <http://www.w3.org/2002/07/owl#DatatypeProperty> ; <http://www.w3.org/2000/01/rdf-schema#isDefinedBy> <http://data.europa.eu/m8g> ; <http://www.w3.org/2004/02/skos/core#scopeNote> "the name given by Christians."@en ; <http://www.w3.org/2000/01/rdf-schema#label> "baptsimal name"@en .

      And save the file.

    3. For the other 2 formats, either they are typed manually or they can be generated with automatic tools such as EasyRDF, RDF translator, RDF validator

    4. Commit and push the changes in the SEMIC puri repository

  2. To setup the HTML redirection, pull the latest code from the SEMIC proxy repository

    1. Open, with a text editor, the file “htmlmap.lua” and add, at the end of the file (before the “}” character), a line for for the new URI:

      ["/m8g/baptismalName"] = "https://semiceu.github.io/Core-Person-Vocabulary/releases/2.00/#Person%3Abaptismal%20name"

      and save the file.

      Be careful to add a “,” character at the end of the previous line so the properties can be correctly read.

      Notice the last part of the line “Person%3Abaptismal%20name”, this is the HTML id generated in the UC2 in the index.html concatenating the class with property name.

    2. Commit and push the changes to the SEMIC proxy repository

  3. The toolchain developer will enable the persistent URI by entering the Persistence Service machine via SSH client and perform the following commands:

    Command Description
    cd uri.semic.eu-proxy/ Enter in the folder of the SEMIC proxy repository
    git pull Insert the GitHub username and personal access token
    make nginx Create a new version of the nginx web server
    make run Run the nginx web server

Test the result

Task: Editing HTML specifications

UC4: Update the publication metadata of the specification

Objective

There is a need to update the metadata of the specification; in order to create a draft for this new release, the following properties are going to be updated:

Roles involved

Prior Knowledge

Repositories

Tools

Steps

  1. Pull the latest code from the SEMIC thema repository

  2. Open with a text editor the core-person-test.json under the config property

  3. Change the respective lines:

    "publication-state": "Semic Draft", "publication-date": "2023-01-01",

    Be careful to add a “,” character after each line and save the file.

  4. Commit and push the file changed in the SEMIC thema repository

  5. Pull the latest code from the SEMIC publication repository

  6. Update the publication.json file in the SEMIC publication repository by changing the dummy value to

    "dummy": "3",

    and commit and push to the SEMIC publication repository

Test the result

Once the publication process ends, pull the latest code from the SEMIC generated repository and verify under the doc/core-vocabulary folder, that there is a core-person-test folder including the index.html file.

Open the index.html with a browser and verify that the 2 properties have been correctly updated.

alt_text

UC5: Adding a changelog section in the specification

Objective

Having created a new property in UC2, there is a need to inform readers of the new specification about what has been changed via a change log. The change log will be a custom section in the specification.

Roles involved

Prior Knowledge

Repositories

Tools

Steps

  1. Pull the latest code from the SEMIC thema repository

  2. Open with a text editor the core-person-test.json under the config property

  3. Change the following line:

    "template": "core-person-ap-test_en.j2",

    Be careful to add a “,” character at the end of the line and save the file.

  4. Now go in the “template” folders and duplicate the core-person-ap_en.j2 file into core-person-ap-test_en.j2

  5. Open the core-person-ap-test_en.j2 and change the change log section with the following text and save the file:

    {% block changelog %} <p> The new property "baptsimal name" has been added. <p> {% endblock %}

    Notice the {% block changelog %} at the beginning and the {% endblock %} at the end to enclose the changelog block that will be used by a generic template.

  6. Commit and push the files changed in the SEMIC thema repository

  7. Pull the latest code from the SEMIC publication repository

  8. Update the publication.json file in the SEMIC publication repository by changing the dummy value to

    "dummy": "4",

    and commit and push to the SEMIC publication repository

Test the result

Once the publication process ends, pull the latest code from the SEMIC generated repository and verify under the doc/core-vocabulary folder, that there is a core-person-test folder including the index.html file.

Open the index.html with a browser and verify that the section change log has been correctly updated.

alt_text

Objective

In order to reflect the style of the organisation creating the new specification, the HTML specification can be changed by changing the colour of the hyperlinks to the green colour.

Roles involved

Prior Knowledge

Repositories

Tools

Steps

  1. Pull the latest code from the SEMIC thema repository

  2. Open the core-person-ap-test_en.j2 in the “template” folder, change the extends header to the following and save the file:

    {% extends "semic_core_voc_test.j2" %}
  3. Commit and push the files changed in the SEMIC thema repository

  4. Pull the latest code from the SEMIC publication repository

  5. Go under the “templates” folder and duplicate the generic template semic_core_voc.j2 file to semic_core_voc_test.j2 file.

  6. Open the semic_core_voc_test.js file with a text editor and add, towards the end, the following lines:

    a, a:hover { color: #00cc23; }

    Just before the </style> tag.

  7. Update the publication.json file in the SEMIC publication repository by changing the dummy value to

    "dummy": "5",

    and commit and push to the SEMIC publication repository

Test the result

Once the publication process ends, pull the latest code from the SEMIC generated repository and verify under the doc/core-vocabulary folder, that there is a core-person-test folder including the index.html file.

Open the index.html with a browser and verify that the colour of the hyperlinks has been correctly updated.

alt_text

Task: Deploy new software releases

UC7: Activate a new release of transformation software in the toolchain

Objective

The toolchain is based upon open source software which are distributed as Docker images.

As software evolves new versions of the software will become available. Each new version will result in a release of a new Docker image with a new version number.

Within this use case, the objective is to change the toolchain so that it will use a new software release. We will upgrade the software that extracts the information from the UML model.

(NOTE) How software is being developed, maintained and released is the responsibility of the software component itself. Each (open source) software component will have its own methodology, approach and lifecycle.

Roles involved

Prior Knowledge

Repositories

Tools

Steps

  1. Pull the latest code from the SEMIC publication repository
  2. Open with the text editor the CircleCI configuration (file .circleci/config.yml)
  3. Find the image declaration of the software that is to be upgraded : https://github.com/SEMICeu/uri.semic.eu-publication/blob/master/.circleci/config.yml#L115
  4. Login into DockerHub and search for the image informatievlaanderen/oslo-ea-to-rdf
  5. Find the tag that corresponds with the to-be deployed release, i.e. json-ld-format-m1.1.3
  6. Change in the Docker image tag at line 115 in the CircleCI configuration file to the selected release
  7. Commit and push the change to the SEMIC publication repository

Test the result

In the CircleCI web interface one can see the execution of the toolchain. If the Docker image was found by CircleCI, and the new release of the software had no breaking (API) changes compared to the previous version, the execution will be successful and the output visible in the SEMIC generated repository.

Otherwise the execution will halt at the step with an error. Resolving errors can happen in numerous ways:

(NOTE) Despite it may be needed to resolve an issue for one data specification, updating software releases is a global change for the toolchain. It impacts all current and future creations of data specifications specified in the SEMIC publication repository. Therefore such changes should be communicated to all editors.

Task: Customise the publication process

UC8: Using CircleCI to build data specifications

Objective

To automate building data specifications, the toolchain is using the CI/CD solution CircleCI.

Consider the case that the generation of examples is not required for data specifications. Therefore, the objective is to adapt the CircleCI workflow to reflect that decision and exclude the generation of examples.

Roles involved

Prior Knowledge

Repositories

Tools

Steps

Part 1 - enable examples being generated for a specification

  1. Pull the latest code from the SEMIC publication repository

  2. Select in the SEMIC generated repository a data specification to create examples for. Let's consider this specification: Core Person test.

  3. Trigger the build process for the selected data specification by updating the publication.json file in the SEMIC publication repository by setting the configuration property “examples” to “true”

    "examples": "true",

    Commit and push to the SEMIC publication repository

  4. Verify in the SEMIC generated repository that the examples are created (see https://github.com/SEMICeu/uri.semic.eu-generated/tree/master/examples/core-person-test)

  5. In the web interface of CircleCI the latest execution trace should mention a step in the visualised workflow called “render-example-templates”.

Part 2 - adapt the workflow to not generate the examples.

  1. Open with the text editor the CircleCI configuration (file .circleci/config.yml) within the SEMIC publication repository

  2. Outcomment the lines 566-568 (https://github.com/SEMICeu/uri.semic.eu-publication/blob/master/.circleci/config.yml#L566) by putting a # symbol as the first character of each line

  3. Outcomment line 601 (#- render-example-templates)

  4. Commit and push the change to the SEMIC publication repository

  5. Select in the SEMIC generated repository a data specification that has NO examples yet

  6. Trigger the build process for an existing specification by updating the publication.json file in the SEMIC publication repository by changing the dummy value to

    "dummy": "6",

    and commit and push to the SEMIC publication repository

Test the result

Once the last change has happened the CircleCI will not anymore create examples, even when the property in the publication JSON is present. This is because the code that would react to this property (being true or false) is not anymore executed. This code has become dormant in the updated CircleCI configuration.

To verify the effect of the change, select another data specification that has no examples present and apply the steps of part 1 for that specification.One will observe that no examples are being created in the SEMIC generated repository.

One also can see the change in the CircleCI web interface. By outcommenting the lines the step has become dormant and is not anymore visible in the workflow.

UC9: Initiating a new SEMIC thema repository

Objective

In use cases 1 - 6, the current setup of the SEMIC toolchain is assumed: a single thema repository containing all Core Vocabularies. However, not all SEMIC data specifications follow the same life cycle as the Core Vocabularies, for instance DCAT-AP. In that case it is recommended to manage the data specification in its own thema repository.

This use case will initiate a new thema repository for a data specification.

Roles involved

Prior Knowledge

Repositories

Tools

Steps

Part 1 - Initialise a new thema repository with boilerplate information

  1. The toolchain developer will login into GitHub using its web browser
  2. Then select thema template on GitHub (https://github.com/Informatievlaanderen/OSLOthema-template)
  3. Use the template to create in the target organisation space a new GitHub repository. In this case, the organisation is SEMICeu and the repository name will be Semicthema-DCAT-AP. This will result in the new GitHub repository https://github.com/SEMICeu/Semicthema-DCAT-AP. Note that the visibility of the new repository is a policy decision; the toolchain is capable of handling public and as well private repositories
  4. Enable the access rights for the editors according to the organisation's policy
  5. As the template contains dummy, placeholder files, the new thema repository is ready to be used

Part 2 - Initialise the new thema repository with concrete information of the new data specification

  1. Configure with the editor the new data specification content in the repository
    1. Adapt the config file that describes the data specification and change the file name to reflect the data specification that is configured in that file
    2. Adapt the UML file to hold an initial version of the data specification
    3. Adapt the stakeholders file to describe all initial collaborators to the data specification
    4. Adapt the overview image to hold the initial version of the data specification
    5. Adapt the template files to hold the initial summary and supportive texts
  2. Clean up and remove any non used boilerplate information
  3. Document all these changes in the CHANGELOG file (in the root of the thema repository) and commit and push the changes

Part 3 - Trigger a build of the data specification in SEMIC publication repository

These steps are similar to the explanation in UC1, and are thus summarised here in short.

  1. Pull the latest code from the SEMIC publication repository

  2. In the SEMIC publication repository modify the publication.json file inside the config/dev folder by adding at the end the following section (just before the “]” character):

    ,{ "dummy": "1", "urlref": "/doc/applicationprofile/DCAT-AP", "repository": "git@uri.semic.eu-thema:SEMICeu/Semicthema-DCAT-AP.git", "branchtag": "main", "name": "dcat-ap", "filename": "config/dcat-ap.json", "navigation": {} }

    This defines a new publication based on the content of the newly created thema repository

  3. Commit and push the change to the SEMIC publication repository

Test the result

When the new thema repository is correctly configured then the last step will result in an update to the SEMIC generated repository. A new directory will appear: /doc/applicationprofile/DCAT-AP.

In case of errors, consult the CircleCI web interface to find the step that caused the error. The most frequent sources of problems are:

UC10: Creating a new full Toolchain

Objective

Fork the current toolchain in separated repositories on GitHub and configure CircleCI to run the new toolchain.

Roles involved

Prior Knowledge

Repositories

Tools

Steps

  1. Fork the SEMIC Thema repository under the name “MyThemaRepo”.

  2. Fork the SEMIC Publication repository under the name “MyPublicationRepo”.

  3. Create a new repository “MyGenRepo” by initializing it and create a branch “master”.

  4. Open Puttygen and create 3 couple of public/private keys, one for each repository, saving the public key (e.g. mythema.pub), the private key (e.g. mythema.ppk) and converting the private key to OPENSSH format (e.g. mythema.ssh):

    alt_text

  5. Deploy the public keys in OPENSSH format in the MyThemaRepo and MyGenRepo:

    alt_text

  6. Login to CircleCi with the GitHub account and select MyPublicationRepo in Projects and press “Setup Up Project” button. Within the new windows select the repository and the branch “master” so that it can be found by CircleCi:

    alt_text

  7. Open the Project Settings of MyPublicationRepo, click in the SSH keys menu and scroll down to add Additional SSH Keys. Upload the content of the 3 private OPENSSH keys using “github.com” as hostname:

    alt_text

When uploading the keys, take note of the fingerprint associated to each key or alternatively you can use puttygen to load a private key and with the “Key” menu, select “show fingerprint as MD5” and the fingerprint associated to key is displayed:

alt_text

  1. In the GitHub Desktop, clone the MyPublicationRepo locally and open the “config.yml” under the folder “.circleci” folder with a text editor.
  2. Replace the 3 fingerprints in the config.yml:

alt_text

  1. Replace the fingerprint of MyGenRepo OPENSSH key down in the file that is under the create-artifact task:

alt_text

  1. Update the GitHub repository for the create-artifact task inside the config.json file within the config folder:

alt_text

  1. Open the file “publication.json” under the folder “config/dev”, and simplify it just leaving the configuration for Core Person test and updating the repository like in the image:

alt_text

  1. Commit and push the changed files (config.yml and publication.json) into the MyPublicationRepo repository:

Test the result

Verify that the CircleCI execution succeeded and that the files the are generated in the MyGenRepo:

alt_text

UC11: Change the script to modify the XML namespace

Objective

Change a script to modify the XML namespace of the generated XSD

Roles involved

Prior Knowledge

Repositories

Tools

Steps

  1. Pull the forked SEMIC publication repository

  2. In the scripts folder, open the render-details.sh file with a text editor

  3. Search for the following line:

    XSDDOMAIN="https://data.europa.eu/m8g/xml/"
  4. Modify the line like the following:

    XSDDOMAIN="https://data.europa.eu/m8g/myxml/"
  5. Commit and push the changed file

Test the result

Pull the generated repository, open the xsd folder of the model generated and verify that the XSD file includes the line (noticing that the xmlns and the targetNamespace are changed):

<xs:schema xmlns="https://data.europa.eu/m8g/myxml/" xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="https://data.europa.eu/m8g/myxml/" ...