The SEMIC Style Guide for Semantic Engineers

Table of Contents

1. Introduction
2. SEMIC Core Vocabularies
3. Conceptual framework
4. Use cases
5. How to create new data models
6. How to map existing data models
7. Glossary

1. Introduction

1.1. What is interoperability?

The term "interoperability" comprising of ‘inter’ (Latin for between), ‘opera’ (Latin for work), and ‘ability’, refers to the intrinsic nature of systems or entities working together to achieve shared goals.

Interoperability in the EU context refers to the capacity of systems or organisations, including public administrations, businesses, and citizens, to collaborate effectively and pursue common objectives across borders. This capability is crucial for providing efficient digital public services, facilitating economic transactions, and supporting the free movement of goods, services, people, and data. The European Interoperability Framework (EIF)[eif], [eif2] and Interoperable Europe Act (IEA) [reg24-903] emphasise that interoperability involves the seamless exchange of information and trusted data sharing across sectors and administrative layers, which is essential for improving policy-making and public service delivery.

1.2. Interoperability through semantic specifications

Semantic interoperability ensures that the precise meaning of exchanged data is maintained throughout its transmission, adhering to the principle that "what is sent is what is understood", encompassing both the semantic and syntactic aspects of data. The semantic aspect focuses on the meaning of data elements and their relationships, whereas the syntactic aspect deals with the structure or format of the data as it is exchanged. On the other hand, technical interoperability covers the infrastructures and applications that facilitate the linkage between systems and services. This includes aspects such as data representation, transmission methods, API design, access rights management, security, and overall system performance.

Semantic data specifications are detailed, standardised data modelling descriptions that help manage how data is defined, represented, and communicated across different systems. They comprise various artefacts that are both machine-readable and human-understandable, thus supporting consistent interpretation and utilisation across diverse IT environments and stakeholders (e.g. developers, business experts, end users, administrators, etc.).

The SEMIC Style Guide [sem-sg] provides essential guidelines for creating and managing such specifications, covering naming conventions, syntax, and the organisation of artefacts into two critical types of semantic data specifications: Core Vocabularies and Application Profiles.

The Core Vocabularies are semantic data specifications that enable public administrations to standardise data exchange processes, thus enhancing the clarity and consistency of data across different systems and sectors. By leveraging these standards, administrations can effectively bridge the gap between differing data practices, ensuring seamless service delivery that meets the needs of citizens and businesses alike.

1.3. What are the Core Vocabularies?

Core Vocabularies are simplified, reusable and extensible data models that capture the fundamental characteristics of a data entity in a context-neutral and syntax-neutral fashion [cv-hb]. The SEMIC style guide exlains how the Core Vocabularies [sem-sg-cvs] are context-neutral semantic building blocks that can be extended into context-specific semantic data specifications to ensure semantic consistency. When the Core Vocabularies are extended to create domain specifications and information exchange models, additional meaning (semantics) is added to the specifications, due to the contextualisation.

2. SEMIC Core Vocabularies

This section contains a brief overview of the Core Vocabularies, indicating how they were developed and how they are maintained.

Since 2011 the European Commission facilitates international working groups to forge consensus and maintain the SEMIC Core Vocabularies. A short description of these vocabularies is included in the Table [below]. The latest release of the Core Vocabularies can be retrieved via the SEMIC Support Center[semic], or directly in the GitHub repository[semic-gh].

Vocabulary	Description
	The Core Person Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of a person, e.g. the name, the gender, the date of birth, the location etc. This specification enables interoperability among registers and any other ICT based solutions exchanging and processing person-related information.
	The Core Business Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of a legal entity, e.g. the legal name, the activity, address, etc. The Core Business Vocabulary includes a minimal number of classes and properties modelled to capture the typical details recorded by business registers. It facilitates information exchange between business registers despite differences in what they record and publish.
	The Core Location Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of a location, represented as an address, a geographic name, or a geometry. The Location Core Vocabulary provides a minimum set of classes and properties for describing a location represented as an address, a geographic name, or a geometry. This specification enables interoperability among land registers and any other ICT based solution exchanging and processing location information.
	The Core Criterion and Core Evidence Vocabulary (CCCEV) supports the exchange of information between organisations that define criteria and organisations that respond to these criteria by means of evidences. The Core Evidence and Core Criterion Vocabulary (CCCEV) addresses specific needs of businesses, public administrations and citizens across the European Union, including the following use cases: Facilitate development of interoperable information systems: the use of common vocabularies to describe criteria and evidence facilitates the development of information systems and improves their interoperability. Create a repository of reusable criteria in machine-readable formats: the use of common vocabularies promotes the creation of a repository of criteria and evidence information. Automate the assessment of criteria: the Core Vocabulary describing the criterion responses allows systems to easily compare the information collected from different parties and enables automatic assessment of the responses for a specific criterion. Automate scoring of responses: weighting criteria, the assessment can be followed by an automate scoring of the responses provided by different parties. Promote cross-border participation in public procurement: the use of the Core Vocabulary for electronic criterion and evidence allows for removing language barriers thereby improving the cross border exchange of information, and the cross-border participation in pan-European selection processes. Calculating statistics: standardising data for criterion, criterion responses and evidences allows calculating statistical information on the most common used criteria for a given process, the most relevant evidences, etc. Create a registry of mappings of criteria: using the Core Vocabulary, it is possible to create a registry of mappings to allow cross-checking of the criteria with the evidences of each particular Member State.
	The Core Public Organisation Vocabulary provides a common data model for describing public organisations in the European Union. The Core Public Organisation Vocabulary (CPOV) addresses specific needs of businesses, public administrations and citizens across the European Union, including the following use cases: Facilitate information sharing: the CPOV enables G2G (Government-to-Government), G2B (Government-to-Business) and G2C (Government-to-Citizen) information sharing. Facilitate the development of common information systems: the use of existing data models for the development of common information systems facilitates the development of those systems and improves their interoperability. Linked Open Organograms: the Core Public Organisation Vocabulary has the potential to link organograms to each other and to high-value data sets. Cross border information exchange: the CPOV allows to manage a cross-border repository of public services and organisations. Find a PO by its function: the public organisation portfolio facilitates discovery of which public authorities and departments are responsible for given areas of the public task. Increase efficiency: the CPOV helps to identify where responsibilities and functions are duplicated or overlap.
	The Core Public Event Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of a public event, e.g. the title, the date, the location, the organiser etc. The Core Public Event Vocabulary aspires to become a common data model for describing public events (conferences, summits, etc.) in the European Union. This specification enables interoperability among registers and any other ICT based solutions exchanging and processing information related to public events.

Vocabulary

Description

The Core Person Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of a person, e.g. the name, the gender, the date of birth, the location etc.

This specification enables interoperability among registers and any other ICT based solutions exchanging and processing person-related information.

The Core Business Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of a legal entity, e.g. the legal name, the activity, address, etc.

The Core Business Vocabulary includes a minimal number of classes and properties modelled to capture the typical details recorded by business registers. It facilitates information exchange between business registers despite differences in what they record and publish.

The Core Location Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of a location, represented as an address, a geographic name, or a geometry.

The Location Core Vocabulary provides a minimum set of classes and properties for describing a location represented as an address, a geographic name, or a geometry. This specification enables interoperability among land registers and any other ICT based solution exchanging and processing location information.

MPu4yRXj56z76nAlmbZoYy9g0scSPRPsMJRiIZd90BgS2S9hHmyqd3iUDonDlOrttzXZkkn3xu79f8HMpem9IgdXjgAAAAASUVORK5CYII=

The Core Criterion and Core Evidence Vocabulary (CCCEV) supports the exchange of information between organisations that define criteria and organisations that respond to these criteria by means of evidences.

The Core Evidence and Core Criterion Vocabulary (CCCEV) addresses specific needs of businesses, public administrations and citizens across the European Union, including the following use cases:

Facilitate development of interoperable information systems: the use of common vocabularies to describe criteria and evidence facilitates the development of information systems and improves their interoperability.
Create a repository of reusable criteria in machine-readable formats: the use of common vocabularies promotes the creation of a repository of criteria and evidence information.
Automate the assessment of criteria: the Core Vocabulary describing the criterion responses allows systems to easily compare the information collected from different parties and enables automatic assessment of the responses for a specific criterion.
Automate scoring of responses: weighting criteria, the assessment can be followed by an automate scoring of the responses provided by different parties.
Promote cross-border participation in public procurement: the use of the Core Vocabulary for electronic criterion and evidence allows for removing language barriers thereby improving the cross border exchange of information, and the cross-border participation in pan-European selection processes.
Calculating statistics: standardising data for criterion, criterion responses and evidences allows calculating statistical information on the most common used criteria for a given process, the most relevant evidences, etc.
Create a registry of mappings of criteria: using the Core Vocabulary, it is possible to create a registry of mappings to allow cross-checking of the criteria with the evidences of each particular Member State.

The Core Public Organisation Vocabulary provides a common data model for describing public organisations in the European Union.

The Core Public Organisation Vocabulary (CPOV) addresses specific needs of businesses, public administrations and citizens across the European Union, including the following use cases:

Facilitate information sharing: the CPOV enables G2G (Government-to-Government), G2B (Government-to-Business) and G2C (Government-to-Citizen) information sharing.
Facilitate the development of common information systems: the use of existing data models for the development of common information systems facilitates the development of those systems and improves their interoperability.
Linked Open Organograms: the Core Public Organisation Vocabulary has the potential to link organograms to each other and to high-value data sets.
Cross border information exchange: the CPOV allows to manage a cross-border repository of public services and organisations.
Find a PO by its function: the public organisation portfolio facilitates discovery of which public authorities and departments are responsible for given areas of the public task.
Increase efficiency: the CPOV helps to identify where responsibilities and functions are duplicated or overlap.

The Core Public Event Vocabulary is a simplified, reusable and extensible data model that captures the fundamental characteristics of a public event, e.g. the title, the date, the location, the organiser etc.

The Core Public Event Vocabulary aspires to become a common data model for describing public events (conferences, summits, etc.) in the European Union. This specification enables interoperability among registers and any other ICT based solutions exchanging and processing information related to public events.

2.1. Representation formats

The Core Vocabularies are semantic data specifications that are disseminated as the following artefacts:

lightweight ontology [sem-sg-wio] for vocabulary definition expressed in OWL [owl2],
loose data shape specification [sem-sg-wds] expressed in SHACL [shacl],
human-readable reference documentation [sem-sg-wdsd] in HTML (based on ReSpec [respec]),
JSON-LD [w3c] context definitions [json-ld],
conceptual model specification [sem-sg-wcm] expressed in UML [uml].

2.2. Licensing conditions

The Core Vocabularies are published under the CC-BY 4.0 licence [cc-by].

2.3. Core Vocabularies lifecycle

The Core Vocabularies have been developed following the ‘Process and methodology for developing Core Vocabularies’ [ec11a]. The Core Vocabularies have an open change and release management process [cv-met], supported by SEMIC, that ensures continuous improvement and relevance to evolving user needs.

This process begins with the identification of needs from stakeholders or issues raised in existing implementations. The Working Group members, SEMIC team or community of users propose changes that are thoroughly assessed for their impact and feasibility. Once a change is deemed necessary, it undergoes a drafting phase where the technical details are fleshed out, followed by public consultations to gather wider input and ensure transparency.

Following consultations, the changes are refined and prepared for implementation. This stage may involve further iteration based on feedback or additional insights from ongoing discussions. The finalised changes are then formally approved and documented, ensuring they are well-understood and agreed upon by all relevant parties.

The release management of Core Vocabularies follows a structured timeline that includes pre-announced releases and public consultation periods to allow users to prepare for changes. Each release includes detailed documentation to support implementation, ensuring users can integrate new versions with minimal disruption. This process not only maintains the quality and relevance of the Core Vocabularies, but also supports a dynamic and responsive framework for semantic interoperability within digital public services.

2.4. Claiming conformance

Claiming conformance to Core Vocabularies is an integral part of validating (a) how well a new or a mapped data model or semantic data specification aligns with the principles and practices established in the SEMIC Style Guide [sem-sg] and (b) to what degree the Core Vocabularies are reused (fully or partially) [sem-sg-reuse]. The conformance assessment is voluntary, and shall be published as a self-conformance statement. This statement must assert which requirements are met by the data model or semantic specification.

The conformance statement highlights various levels of adherence, ranging from basic implementation to more complex semantic representations. At the basic level, conformance might simply involve ensuring that data usage is consistent with the terms (and structure, but no formal semantics) defined by the Core Vocabularies. Moving to a more advanced level of conformance, data may be easily transformed into formats like RDF or JSON-LD, which are conducive to richer semantic processing and integration. This level of conformance signifies a deeper integration of the Core Vocabularies, facilitating a more robust semantic interoperability across systems. Ultimately, the highest level of conformance is achieved when the data is represented in RDF and fully leverages the semantic capabilities of the Core Vocabularies. This includes using a range of semantic technologies, adhering to the SEMIC Style Guide, fully reusing the Core Vocabularies, and respecting the associated data shapes.

3. Conceptual framework

This section delves into the conceptual framework of semantic data specifications. Understanding this framework allows stakeholders to use the semantic data specifications effectively and align expectations and practices ensuring consistent and effective communication.

The structure of the section is methodically organised into several subsections, each focusing on a different element of semantic data specifications. It begins with a broad overview of the specifications, and establishes what the artefacts are. Then progressively it narrows down to specific artefacts types, namely data models and documentation. Further subsections explore data models interaction across different layers of data interoperability, and semantic data specification types. This sequential approach helps readers build a comprehensive understanding from general concepts to specific explanations.

3.1. Semantic data specifications

Semantic data specifications are composite standards designed to facilitate data exchange and interoperability among diverse systems, characterised by their descriptive and prescriptive nature. These specifications are realised through a suite of artefacts that are harmoniously interrelated and address different interoperability scopes and use cases—ranging from semantic to technical concerns. The artefacts are fashioned to be both machine-readable and human-understandable, ensuring consistent interpretation and utilisation.

Figure 1 depicts a conceptualisation of how various components that make up a complete semantic data specification interconnect. At the top of the diagram is the "Semantic data specification" indicating its overarching role. It serves a "Purpose/Goal" which frames various specific "Concern/Need". The semantic data specification comprises various "Artefacts" denoting the different elements that make up the specification.

Figure 1

Beneath this, the framework branches into two main types of artefacts: "Data models” and "Documentation". The most relevant data models are "Vocabulary", "Ontology", "Data shape" and "UML Class model". Each data model is expressed in a "Modelling language" appropriate for the concern or the need addressed. The next section introduces the relevant artefact types.

3.2. Artefacts

Integral to semantic data specifications is the intrinsic consistency and coherence among the artefacts. Each represents a facet of the same domain knowledge, but is tailored to address specific concerns—such as human understandability, semantic underpinning, formal definition, and data serialisation (addressed in the next section). This alignment ensures that each artefact, while distinct in function, contributes to a unified view of the domain, making the entire specification accessible and actionable. Such consistency is pivotal in maintaining semantic integrity, leading to robust technical interoperability and seamless information exchange.

Each artefact, while unique in its form and function, represents different facets of the same domain. They are harmonised, yet distinct, with each created to address specific concerns, such as:

Semantic Underpinning: The semantic data specification needs to encapsulate formally the domain knowledge, capturing the essence of its concepts and the possible relationships between them. Ontologies play a key role here, offering a structured and logical framework that lays out the domain knowledge in a way that is both comprehensive and actionable.
Formal Definition: Using formal languages such as OWL or RDFS for semantic representation enables precise interpretation and inference over the ontologies and instance data. Moreover, data shapes facilitate data structuring on top of the ontology, defining precise constraints and conditions under which the data can be instantiated. This formalisation ensures that the data adheres to the standard, facilitating automated validation and processing.
Human Understandability: This aspect ensures that individuals, regardless of their technical expertise, can comprehend and engage with the semantic data specifications. The reference documentation along with visual representation of UML class diagrams brings clarity and guidance for human users to grasp the meaning of the semantic data specification and its intended use.
Visual Representation: The semantic data specification is much easier to understand once it is presented in a visual format. Typically class diagrams are the most suitable to encapsulate the concepts and the relations between boosting significantly comprehension.
Data Serialisation: The technical artefacts of the specification, such as information exchange data models for various serialisation formats (e.g., JSON-LD, XML), ensure that the data can be correctly serialised, deserialized, and exchanged across systems and platforms. They cater for the technical requirements of data transport (and storage).

The coherence among these artefacts ensures that despite their different purposes and audiences, they all align in their representation of the domain knowledge. This alignment guarantees that whether a stakeholder is interpreting the model conceptually, engaging with it through documentation, or implementing it technically, they are presented with a unified and consistent view of the semantic data specification. This cohesive approach is pivotal for maintaining semantic integrity across various applications and systems.

3.3. Data models

The key data model artefacts of a specification include:

Vocabulary: An established list of preferred terms that signify concepts or relationships within a domain of discourse. All terms must have an unambiguous and non-redundant definition. Optionally it may include synonyms, notes and their translation to multiple languages. It is represented informally, for example, as a spreadsheet or SKOS thesaurus.
Ontology: An ontology is a formal, machine-readable specification of a conceptual model [Harpring2016]. It encompasses a representation, formal naming using URIs, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse, effectively enabling a shared and common understanding of data [wiki-onto]. It is usually expressed in OWL and RDFS.
Data Shape: Constraints or patterns that describe how instantiations of an ontology should be structured. Data shape artefacts can be used not only to ensure that RDF data adheres to predefined structure and validation rules, but also as a blueprint for information exchange data models, preserving semantics and ensuring consistency in data exchange. It is usually expressed in SHACL.
Controlled list (of values): A value vocabulary used to express concepts or values in instance data. It defines resources (such as instances of topics, languages, countries, or authors) that are used as values for elements in metadata records. Typically, the value vocabularies serve as reference data and constitute "building blocks" with which metadata records can be populated.
UML class model: A static structure UML model and associated diagrams that describes the structure of data by showing the classes, their attributes, and the relationships among objects. It may include documentation, description and various annotations. Such data model shall be conformant to the SEMIC style guide in order to fit for purpose of semantic data specifications.
Human-readable documentation: This artefact elucidates the specifications for stakeholders of varying business and technical backgrounds, detailing the structure, intent, and practical application of the semantic data specification.

Beyond these foundational elements, semantic data specifications may also incorporate artefacts designed for the technical interoperability layer called information exchange data models. They define and describe in a technology-specific manner the structure and content of information that is exchanged between organisations in a specific information exchange context. They detail the syntax, structure, data types, and constraints necessary for effective data communication between systems. These artefacts are necessary to realise the technical interoperability. If the chosen technology for exchange is of semantic nature (e.g. RDF), then a perfect syntax-semantics conflation is readily available through ontologies and data shapes. Otherwise, if a more traditional technology is selected due to popularity or legacy reasons, such as XML/XSD or JSON, then a mapping that acts as a syntax-semantics interface needs to be established, binding the physical model and the semantic specification. These can include various information exchange data models like:

JSON-LD context definitions: Facilitating the mapping of JSON to RDF linked data representation.
XML Schemas (XSD): Defining the structure and validating the constraints of XML documents.
API (Component) Specifications (REST, WSDL or GraphQL): Outlining the request, response and parameters for web-based data access and manipulation [swagger]. Such components are generally reusable blocks to facilitate reusability and maintenance of APIs.

3.4. Artefacts across interoperability layers

In the framework depicted [below], the artefacts are strategically organised within the semantic and technical interoperability layers, with each layer focusing on different but complementary aspects of data interoperability. As it can be seen, some artefacts belong to the semantic layer, and others to the technical layer, while the data shapes are present in both, according to their multipurpose nature.

The Semantic Layer encapsulates artefacts associated with the conceptual understanding of data. It is focused on defining the vocabulary and ontology that provide the foundational elements for data interoperability. These artefacts ensure that the meaning of data is clearly defined and shared across different systems, establishing the semantic rules that govern data exchange.

The Technical Layer is concerned with the practical aspects of data handling, such as data representation formats, communication protocols, and interface specifications. Artefacts in this layer address the technical requirements necessary for data to be physically exchanged and processed by information systems.

Documentation and UML Class models are depicted as orthogonal to these layers, as they facilitate human understanding and transcend the semantic-technical divide. These artefacts provide clarity and guidance, helping stakeholders visualise and comprehend the data structures and relationships without being confined to the constraints of either layer.

3.5. Data specification types

We can discern three interconnected layers each representing a different level of abstraction in semantic data specifications. The arrangement signifies the gradation from the abstract to the specific.

The Upper Layer accommodates the most abstract form of semantic data specifications. These specifications are context-free, meaning that they are not tied to any particular domain or application, and can be universally applied across various fields. These semantic data specifications, provide the broadest concepts that can be reused in numerous contexts. Here we generally find upper level ontologies (defining highly abstract foundational concepts such as “object”, “property”, “event” etc.), but also the core semantic data specifications, which, although more specific, can also be applied across multiple domains. The main objective of the Core Vocabularies is to provide terms to be reused in the broadest possible context [sem-sg-wsds].

Upper ontologies and core semantic data specifications serve as a scaffolding for domain ontologies, offering a hierarchy where the more general terms of the upper ontology act as superclasses (in some cases even as metaclasses) to the more specific classes of domain ontologies. This arrangement supports the structuring and integration of knowledge by providing common reference points that enhance understanding and data processing across different systems.

Notable examples:

Upper ontologies: DOLCE, Gist, BFO, etc.
Core Vocabularies: Dublin Core Terms, Data Catalog Vocabulary (DCAT), The Organization Ontology (ORG), European Legislation Identifier (ELI)

The Domain Layer sits at the intersection of the upper and application layers. It contains specifications that are more specific than the upper layer, but not as narrowly focused as the application layer. The semantic data specifications in this layer incorporate concepts relevant to a domain or sector (e.g. the justice domain, the public procurement domain, the healthcare domain) and represent the most specific knowledge from the perspective of that domain.

The domain layer is visually overlapped by both the upper and application layers, symbolising that some domain-specific semantic data specifications can inherit traits from, or lend characteristics to, both the more abstract upper layer and the more concrete application layer.

Notable examples:

DCAT-AP
eProcurement Ontology

The Application Layer is the most concrete and context-specific, containing semantic data specifications tailored for particular applications or families of applications. Application Profiles are detailed in constraints and data shapes, addressing explicit needs and constraints of a specific system or use case, and generally provide precise technical artefacts that can be used in data exchange.

Notable examples:

GeoDCAT-AP
BRegDCAT-AP
Stat-DCAT-AP

Terminological Clarification: The level of abstraction pertaining to a semantic data specification—be it core, domain, or application—can be applied as an adjective to describe its constituent artefacts. Thus, for a "core semantic data specification" the included components would be referred to as "core vocabulary", "core ontology", "core data shape" and "core exchange data model" and so on. Similarly, for a "domain semantic data specification," the elements would be denoted as "domain vocabulary", "domain ontology", "domain data shape" and "domain exchange data model", respectively.

3.6. Documentation

In the semantic data specification framework depicted [below], the documentation artefacts are organised into three distinct types, as illustrated in Figure 2, each catering to different aspects of user engagement with the data model. For effective documentation practices, we recommend principles laid out in the Diátaxis framework, which is a systematic approach to understanding the needs of documentation users [dtx]. It identifies four distinct needs (learning, understanding, consulting reference, achieving goals), and four corresponding forms of documentation - tutorials, handbooks, reference documentation and textbooks. It places them in a systematic relationship, and proposes that documentation should itself be organised around the structures of those needs.

In addition, we mention the Diagrams, which are usually embedded into documents, to underline the importance of visual depiction of models and to recognise them as distinct artefacts from the models (e.g. UML class diagrams). In the context of semantic data specifications we find relevant the following documentation kinds.

Figure 2

The Handbook (or usage manual) is a how-to guide and serves as an introductory reading to users new to the semantic data specification. It can also take the form of a tutorial to achieve predefined goals. It typically comprises use-case descriptions, examples and practical, step-by-step instructions designed to help users acquire the necessary skills to effectively use the semantic data specification.

Examples: This document

The Textbook (or explanatory manual) is an explanatory type of documentation and focuses on deepening the users’ understanding of the underlying concepts and principles incorporated into the semantic data specification. It aims to inform cognition, enhancing the user’s theoretical knowledge and conceptual insight, which is critical for those looking to gain a more profound grasp of the specification’s rationale, decisions, strengths and limitations.

Examples: SEMIC Style Guide [sem-sg]

The Reference document is a technical type of documentation and provides concise, detailed information about various elements of the semantic data specification. It serves users who are already familiar with the theoretical framework and need to apply their knowledge to specific tasks. This artefact is a go-to resource for factual and objective data about the semantic specifications, such as semantics, syntax, entities, properties, relationships, and constraints within the data model.

Examples: reference documents for Core Person [cpv], Core Business [cbv], etc.

These documentation artefacts are designed to collectively support the user’s journey from novice to expert within the semantic data specification domain. The Usage Manual aids in initial skill acquisition, the Explanatory Textbook supports deeper learning and understanding, and the Reference Documentation acts as a reliable resource for informed application and use. Together, they ensure that users at different stages of learning and practice have access to the appropriate materials to meet their needs.

4. Use cases

This handbook serves as a practical guide for using Core Vocabularies in various common situations. To provide clear and actionable insights, we have categorized potential use cases into two groups:

Primary Use Cases: These are the most common, interesting, and/or challenging scenarios, all thoroughly covered within this handbook.
Additional Use Cases: These briefly introduce other relevant scenarios not elaborated on in detail, for the sake of brevity.

Within both groups, we differentiate between use cases focused on the creation of NEW artefacts and those involving the mapping of EXISTING artefacts to Core Vocabularies.

For a better overview, we numbered the use cases and organised them into two tables, followed by the description of these use cases in two separate subsections, one dedicated to the addressed use cases and one to the use cases that are not addressed in this handbook.

ID	Goal	Data specification / Artefact
UC1	Create a NEW	Information exchange data model
UC1.1	Create a NEW	XSD schema
UC1.2	Create a NEW	JSON-LD context definition
UC2	Map to a Core Vocabulary an EXISTING	Data model
UC2.1	Map to a Core Vocabulary an EXISTING	Ontology
UC2.2	Map to a Core Vocabulary an EXISTING	XSD schema

Goal

Data specification / Artefact

UC1

Create a NEW

Information exchange data model

UC1.1

Create a NEW

XSD schema

UC1.2

Create a NEW

JSON-LD context definition

UC2

Map to a Core Vocabulary an EXISTING

Data model

UC2.1

Map to a Core Vocabulary an EXISTING

Ontology

UC2.2

Map to a Core Vocabulary an EXISTING

XSD schema

Table: Listing of addressed use cases

ID	Goal	Data specification / Artefact
UC3	Create a NEW	Semantic data specification
UC3.1	Create a NEW	Core Vocabulary
UC3.2	Create a NEW	Application Profile
UC4	Create a NEW	Data model
UC4.1	Create a NEW	Ontology
UC4.2	Create a NEW	Data shape
UC2.3	Map to a Core Vocabulary an EXISTING	JSON schema

Goal

Data specification / Artefact

UC3

Create a NEW

Semantic data specification

UC3.1

Create a NEW

Core Vocabulary

UC3.2

Create a NEW

Application Profile

UC4

Create a NEW

Data model

UC4.1

Create a NEW

Ontology

UC4.2

Create a NEW

Data shape

UC2.3

Map to a Core Vocabulary an EXISTING

JSON schema

Table: Listing of unaddressed use cases

The use cases provided in this handbook are written in white-box point of style oriented towards user goals[weuc].

We will use the following template to describe the relevant use cases that were listed above:

Use Case <UC>: Title of the use case

Goal: A succinct sentence describing the goal of the use case

Primary Actor: The primary actor or actors of this use case

Actors: (Optional) Other actors involved in the use case

Description: Short description of the use case providing relevant information for its understanding

Example: An example to illustrate the application of this use case

Note: (Optional) notes about this use case, especially related to its coverage in this handbook

4.1. Addressed use cases

Use Case UC1: Create a new information exchange data model

Goal: Create a new standalone data schema that uses terms from Core Vocabularies.

Primary Actors: Semantic Engineer, Software Engineer

Description: The goal is to design and create a new data schema or information exchange data model that is not part of a more comprehensive semantic data specification, relying on terms from existing CVs as much as possible.

Note: As this is a more generic use case it will be broken down into concrete use cases that focus on specific data formats.

Use Case UC1.1: Create a new XSD schema

Goal: Create a new standalone XSD schema that uses terms from Core Vocabularies.

Primary Actors: Semantic Engineer, Software Engineer

Description: The goal is to design and create a new XSD schema that is not part of a more comprehensive semantic data specification, relying on terms from existing CVs as much as possible. As an information exchange data model, an XSD Schema can be used to create and validate XML data to be exchanged between information systems.

Example: OOTS XML schema mappings [oots]

Note: A detailed methodology to be applied for this use case will be provided in the Create a new XSD schema

Use Case UC1.2: Create a new JSON-LD context definition

Goal: Create a new standalone JSON-LD context definition that uses terms from Core Vocabularies.

Primary Actors: Semantic Engineer, Software Engineer

Description: The goal is to design and create a new JSON-LD context definition that is not part of a more comprehensive semantic data specification, relying on terms from existing CVs as much as possible. As an information exchange data model, a JSON-LD context definition can be integrated in describing data, building APIs, and other operations involved in information exchange.

Example: Core Person Vocabulary [cpv-json-ld], Core Business Vocabulary [cbv-json-ld]

Note: A detailed methodology to be applied for use cases will be provided in the Create a new JSON-LD context definition section.

Use Case UC2: Map an existing data model to a Core Vocabulary

Goal: Create a mapping of an existing (information exchange) data model, to terms from Core Vocabularies.

Primary Actors: Semantic Engineer

Actors: Domain Expert, Software Engineer

Description: The goal is to design and create a mapping of an ontology, vocabulary, or some kind of data schema or information exchange data model that is not part of a more comprehensive semantic data specification, to terms from CVs. Such a mapping can be done at a conceptual level, or formally, e.g. in the form of transformation rules, and most often will include both.

Note: Since this is a more generic use case it will be broken down into concrete use cases that focus on specific data models and/or data formats. Some of those use cases will be described in detail below, while others will be included in the next section, which is dedicated to the unaddressed use cases.

Use Case UC2.1: Map an existing Ontology to a Core Vocabulary

Goal: Create a mapping between the terms of an existing ontology and the terms of Core Vocabularies.

Primary Actors: Semantic Engineer

Actors: Domain Expert, Business Analyst, Software Engineer

Description: The goal is to create a formal mapping expressed in Semantic Web terminology (for example using rdfs:subClassOf, rdfs:subPropertyOf, owl:equivalentClass, owl:equivalentProperty, owl:sameAs properties), associating the terms in an existing ontology that defines relevant concepts in a given domain, to terms defined in one or more CVs. This activity is usually performed by a semantic engineer based on input received from domain experts and/or business analysts, who can assist with the creation of a conceptual mapping. The conceptual mapping associates the terms in an existing ontology, which defines relevant concepts within a specific domain, to terms defined in one or more SEMIC Core Vocabularies. The result of the formal mapping can be used later by software engineers to build information exchange systems.

Example: Mapping Core Person to Schema.org [map-cp2org], Core Business to Schema.org [map-cb2org], etc.

Note: A detailed methodology to be applied for this use case will be provided in the Map an existing Ontology section.

Use Case UC2.2: Map an existing XSD Schema to a Core Vocabulary

Goal: Define the data transformation rules for the mapping of an XSD schema to terms from Core Vocabularies. Create a mapping of XML data that conforms to an existing XSD schema to an RDF representation that conforms to a Core Vocabulary for formal data transformation.

Primary Actors: Semantic Engineer

Actors: Domain Expert, Business Analyst, Software Engineer

Description: The goal is to create a formal mapping using Semantic Web technologies (e.g. RML or other languages), to allow automated translation of XML data conforming to a certain XSD schema, to RDF data expressed in terms defined in one or more SEMIC Core Vocabularies. This use case required definitions of an Application Profile for a Core Vocabulary because the CV alone does not specify sufficient instantiation constraints to be precisely mappable.
Such activity can be done by semantic engineers based on input from domain experts and/or business analysts, who can assist with the creation of a conceptual mapping. The conceptual mapping is usually used as the basis for the formal mapping. The conceptual mapping can be a simple correspondence table associating the XML Elements defined in an XSD schema, to terms defined in one or more SEMIC Core Vocabularies. In some cases the creation of the conceptual mapping can be done by the semantic engineers themselves, or even by the software engineers building information exchange systems.

Example: ISA2core SAWSDL mapping [isa2-map]

Note: A detailed methodology to be applied for this use case will be provided in the Map an existing XSD schema section.