General conventions

Reuse existing concepts as much as possible

Title: Reuse existing concepts as much as possible

Identifier GC-R1

Statement:

Reuse existing concepts as much as possible, respecting the original semantics and lexicalisation.

Description:

Only use constructs with semantics (human and machine-readable) that support the use case or domain. A similar reflex must be considered by reusing properties and classes from other vocabularies. As a general rule, it is safe to reuse annotation properties, as they carry no logical semantics. Conversely, no concepts shall be redefined unless they are significant variations of the existent ones. In order to achieve a common approach to reuse, apply the reuse guidelines as specified above [Clarification on the reuse].

See more [ ld-bp, ldp-bp, dwbp, epo-arch, oslo-rules ]

Prefer maintained vocabularies

Title: Prefer maintained vocabularies

Identifier GC-R2

Statement: Quality of maintenance and governance should be reviewed before reuse. Reuse mostly vocabularies that are well maintained and governed.

Description:

Some projects are implemented and closed, while some other ones keep being maintained. The maintenance usually aims at (a) fixing errors, (b) making gradual improvements, and (c) supporting users in using the released data specification, to mention the least. Maintenance of the project can also include URI dereferencing and handling open issues in a timely fashion. Some projects may foresee an evolution roadmap, defined in a change and release policy, with a series of evolution stages planned.

Concept reference and terms

Title: Concept reference and terms

Identifier GC-R3

Statement:

Each concept shall be (a) referenceable by a URI, (b) formally defined and (c) described by a precise, unambiguous human-readable label and definition.

Description:

Terms and concepts are not the same things. A term is a word, compound word, or multi-word expression that, in specific contexts, is given specific meanings – these may deviate from the meanings the same words have in other contexts and in everyday language. A concept can be viewed as an idea or notion, a unit of thought [skos]. However, what constitutes a unit of thought is subjective, and this definition is meant to be suggestive rather than restrictive. That is why each concept needs to be well-named (termed) by providing preferred and alternative labels and should have a clear and precise definition supported by examples and explanatory notes. So, we need a precise way to refer to concepts. The mechanism offering such a possibility is using URIs to establish the identity of a concept’s "meaning". URIs also enable referencing, an essential feature of meaning exchange (communication). Such a reference allows for establishing a common meaning, not only between humans and machines but also among humans. Finally, the meaning of the term must be clear. Its definition is associated with the term via the URI. To make the term reusable, it should be published as part of a data specification.

See more [ ld-bp, ldp-bp, dwbp, epo-arch, oslo-rules ]

Vocabulary terminology style

Title: Vocabulary terminology style

Identifier GC-R4

Statement:

The terminology style shall be consistent across the vocabulary.

Description:

In practice, some variation in terminology style (concept naming conventions) can be noticed across a wide range of semantic data specifications. Whatever the adopted style may be, the author shall consistently apply it to the whole model. These names will be reflected in all artefacts and in the URIs.

Next, we provide a set of loose recommendations:

  • British English is the lingua franca in the model.

  • Avoid acronyms and abbreviations in concept names

  • Classes are names with nouns and singular

  • Names are case-sensitive.

  • Relations and Properties use verbs at a present tense

    • Some practices suggest adding the prefix "has" for quality attributes like "weight", "height", etc. or the prefix or "is" for boolean indicators, and some other practices recommend avoiding such prefixes as they add no additional information to the name.

  • Most often, the name of the inverse relation should not be a semantically inverted verb, such as in the case of "buys/sells" or "opens/closes". It should be done by changing the voice from active to passive, e.g. "plays/playedBy".

Vocabulary definition styling

Title: Vocabulary definition styling

Identifier GC-R5

Statement:

The concept definitions shall be elaborated consistently across the vocabulary.

Description:

In practice, different styles of defining concepts can be noticed across a range of semantic data specifications.

The SEMIC Principles for creating good definitions [semic-defs] are a basis for writing definitions. They are based on advice found in the literature and are the following:

  • Be concise but complete,

    • Avoid over-generalisations. Adapt and contextualise the definition to the surrounding/connected concepts.

    • Make sure that every concept that occurs in the model is directly (or indirectly) defined

  • Describe only one term

  • Structure the definition in a standardised way:

    • Use the singular form to phrase the definition [GC-R4]

    • State what the term is, and don’t enumerate what it is NOT (no negative definition)

    • Use only commonly understood abbreviations

    • Use similar terminology for related definitions

  • Don’t use circular definitions, i.e. the term defined should not be part of the definition,

  • Don’t add secondary information such as additional explanation, scoping, examples, etc. these are to be documented in usage notes.

  • Form the definition in one or more sentences that start with a capital letter and end with a period.

  • Do not start a definition with a repetition of the name of the concept.

  • Rich standard encodings such as UTF-8 and UTF-16 are supported in notes and definitions. In the element names, however, we recommend avoiding any character encodings and using plain ASCII [epo-cmc, sec 4.2].

Reuse compliance

Title: Reuse compliance

Identifier GC-R6

Statement:

Compliance with a semantic data specification is satisfied by appropriate usage of terms that is in accordance with definitions and constraints.

Compliance checking with an application semantic data specification shall be permissive. This means that what is not forbidden is permitted. If the context requires more restrictions,then an application profile needs to be established for a narrow(er) scenario.

Technically, we envisage compliance checking limited to correct referencing of the concept URIs and respecting the cardinality constraints and value constraints in the case of properties. This falls within the scope of data shape definitions. Additionally, more specific compliance requirements and constraints can be added as necessary.

Compliance checking may involve multiple levels of severity. For example in the SHACL specifications three levels are defined: Violation, Warning, Info. We assume by default the SHACL severity specifications unless other denotations systems are provided (i.e. different labels and delimitation of severity). Also in absense of specifications, any unfulfilled compliance check is considered a Violation.

The semantic data specifications may provide such severity levels. How it is realised in the conceptual model is open at the moment. The main place to provide such specifications is the data shape artefact. In the future we can return to this aspect and provide more guidance.

Deontic modals

Title: Deontic modals

Identifier GC-R7

Statement:

Indicators of deontic modalities for classes and properties do not have semantic or normative value. Still they may be used as editorial annotations.

Deontic modalities indicate levels obligation, permission, necessity and related concepts.

As a general recommendation, to use deontic indicators in the semantic data specifications is discouraged. Such indicators could be of editorial or guiding role for the users and adopters of the data specifications. However, it should be noted that they are not considered as part of any compliance validation or semantic interpretation of the data model.

In the standardisation community a common practice is to indicate levels of obligation or permission for concepts in a semantic data specification. Obligation indicator signals whether a statement using a class or a property is required in an instantiation; while, Permission indicator signals whether a statement using a class or a property is allowed or forbidden in an instantiation.

The common deontic indicator values are:

  • mandatory signifying that a statement using a class or property is required,

  • recommended signifying that a statement using a class or property is optional but recommended,

  • optional signifying that a statement using a class or property is optional,

  • forbidden signifying that a statement using a class or property is not permitted.

Still, there are ways to achieve the same effect as the these indicators through other means. For properties, the main instrument is employment of cardinality constraints (per property per class). To make a property mandatory set the minimum cardinality to one or more [1..*], otherwise relax the minimum cardinality constraint to keep the property optional [0..*].

For classes, it is possible to mark a class as abstract, which means that it cannot be directly instantiated, therefore achieving the effect of forbidden. However, deontic indicators shall be avoided for classes because a class may be mandatory and optional in different instantiation or exchange scenarios within the same application profile.

For example consider the DCAT-AP and two mandatory classes: dcat:Catalog and dcat:Dataset. When metadata of a catalogue (and its records) is exchanged, then both classes dcat:Catalog and dcat:Dataset must be instantiated; however when a single dataset metadata is being exchange then only dcat:Dataset instance shall be provided. Moreover, in the second scenario, providing an instance of dcat:Catalog will be counterproductive and possible leading to errors.

Descriptions of what classes can or shall be bundled together when participating in information exchange belong in "data exchange contracts", "API endpoint scheme definitions" or the likes of these. Such specifications belong in the Technical Interoperability layer of the European Interoperability Framework (EIF) [eif], and are (currently) out of scope of this style guide, which aims primarily at addressing the semantic interoperability.

If semantic engineers prefer or are compelled to employ deontic indicators, then deontic indicators must be precisely defined and those definitions must be published. No reliance on common sense understanding shall be assumed as the meaning of such deontic indicators may (and certainly) differ not only among readers of data specifications but also in different data specifications.