UC 2.1 Tutorial on Mapping an existing Model to a Core Vocabulary
Introduction
Imagine different organisations speaking different dialects of the same language. One might say “Organisation,” another says “LegalEntity.” They’re both talking about similar things—but how do we get them to understand each other?
Ontology alignment and mapping helps solve this problem by creating bridges between different models, allowing each organisation to keep their unique vocabulary while ensuring that the meaning is preserved.
This tutorial demonstrates how to map the Core Business Vocabulary (CBV) to Schema.org, addressing Use Case UC2.1. By following the map an existing model methodology, you’ll learn how to align these two vocabularies step-by-step—covering staging, characterisation, reuse, matching, alignment, and application—to ensure interoperability between the CBV and Schema.org.
Phase 1: Staging (Defining the Requirements)
In this phase, the aim is to understand what needs to be mapped and why. For this tutorial, we aim to map the Core Business Vocabulary (CBV) to Schema.org, enabling data interoperability.
Steps:
-
Determine the purpose of the mapping: What are the key areas of business data that need to be interoperable between CBV and Schema.org? This is carried out in collaboration with stakeholders.
-
Define Scope: What parts of the Core Business Vocabulary need to be mapped to Schema.org? Are there specific concepts (e.g.,Contact Point, Organization, etc.) that must be represented? Which version of each ontology or vocabulary will be mapped?
-
Set mapping Goals: Define the intended outcomes of the mapping, such as ensuring semantic alignment between CBV and Schema.org entities. For example, one key goal may be to clarify the relationship between legal:LegalEntity and schema:Organization (Issue #38)
Procedure for CBV and Schema.org
The purpose, scope, and goals are determined by the stakeholders, including domain experts and knowledge engineers. First, the domain experts’ input is needed to demarcate the scope especially, indicating what the (sub-)topic of interest is, ideally augmented with key terms. For CBV and Schema.org, these may include terms such as: legal:LegalEntity vs. schema:Organization.
It may also need to take into account ‘externalities’, such as regulatory compliance that may dictate the use of one version of a schema over another for some business reason. For the current exercise, there are no regulatory compliance requirements in place. Therefore, the latest official releases of both vocabularies will be used (Schema.org version 29.1 and CBV version 2.2.0).
Next, clear mapping goals should be established.
For this exercise, the primary goal is to identify direct relationships between the two vocabularies. These relationships will then be expressed in a machine-readable format, enabling seamless data transformation from the Core Business Vocabulary (CBV) to Schema.org.
Phase 2: Characterisation (Defining Source and Target Data)
The aim of this phase is to analyse the structure, vocabulary, and semantics of the Core Business Vocabulary that we shall take as Source ontology and Schema.org that will be set as Target ontology. The key steps and outputs are as follows.
Steps:
-
Examine both ontologies for:
-
Entity structures, definitions, and formats.
-
Deprecation policies and inheritance mechanisms, natural language(s) used, label specification, label conventions, definition specification, definition conventions, version management and release cycles, etc.
-
Output:
-
A comparison in the form of a table
-
Optionally: a brief report containing a list of obstacles that need to be overcome before model matching can take place
Procedure for CBV and Schema.org
First, list the features on which to compare the source and the target, which concerns principally the ‘meta’ information, or: information about the artefacts, rather than its contents about the subject domain. This includes typical features such as the serialisation format(s) in which the artefacts are available, naming conventions of the terminology, and version management.
The features can have adverse or facilitating consequences for the mapping task. Let’s consider three of them here and relate them to our case. First, are the files available in the same format? This is indeed the case for CBV and Schema.org, and even leaves the choice for using their RDF or JSON-LD format. The natural language of CVB and Schema.org, that is, the rendering of the entities’ names and labels, are both in one language, and so translating entities’ names is not needed. Regarding frequency of version updates, there is a notable difference. CBV is relatively stable with two main releases, whereas Schema.org has frequent releases and it is currently in its 29th main release cycle. .
Second, we list the selected features in a table, and for each feature, find the answer for CBV and Schema.org. For the CBV and Schema.org metadata comparison, we had to consult the documentation, the developer release pages, and inspect the files to obtain the answers. The selected features and comparison with the values is shown in the table below.
Feature | Core Business Vocabulary | Schema.org |
---|---|---|
Specification |
HTML document |
HTML document |
Computer processable formats |
UML, RDF, JSON-LD, SHACL |
OWL, RDF (ttl, rdf/xml), CSV, JSON-LD, NQ, NT, SHACL, SHEXJ |
Inheritance |
Single inheritance |
Multiple inheritance |
Label |
rdfs:label, shacl:name (within SHACL shapes) |
rdfs:label |
Naming scheme |
CamelCase for classes (e.g., LegalEntity) and lowerCamelCase for properties |
CamelCase for classes (e.g., EducationalOrganization)and lowerCamelCase for properties |
Label formatting |
With spaces (e.g., Legal Entity) |
In CamelCase |
Language |
English |
English |
Deprecation |
No |
Yes |
Definitions |
rdfs:comment, shacl:description within SHACL shapes |
Written in rdfs:comment |
Latest version inspected |
Latest (v 2.0.0, 6-5-2024). 1 or 2 releases per year |
29.0 (24-3-2025). 1 or 2 releases per year |
Developer location |
Phase 3: Reuse of Existing Mappings
The aim of this phase is to avoid doing duplicate work by checking if any existing mappings between CBV and Schema.org are available for reuse or if there are any alignments that can be adapted for this project.
Steps:
-
Search for Existing Alignments: Looking for any pre-existing alignments that may have been created by others or as part of previous work by consulting the SEMIC GitHub repository for relevant mappings.
-
Evaluate Reusability: Determine whether these existing alignments meet your project’s requirements. If they do, they can be reused directly.
-
Adapt Existing Alignments: If the existing alignments are close but need modification, adapt them to suit the specific project goals.
Output:
-
A document listing:
-
which type of alignment was chosen for which existing alignments.
-
The decisions: A new alignment needs to be created.
-
Procedure for CBV and Schema.org
There are three distinct pathways, being direct use, adaptive reuse, and creating a new alignment. Let’s look at each in turn.
For the CBV and Schema.org, we first look for pre-existing alignments of related vocabularies. They may be in the files themselves, but we also can search a relevant repository with other files that may have relevant mappings, such the SEMIC GitHub repository in this case, and using alignment frameworks such as EDOAL or SSSOM.
From searching through the SEMIC repository, we found several vocabularies that have alignments to Schema.org already, which may be reusable. They are listed in the following table, alongside with the location and at which date we checked the mapping, as it may change with different versions (recall the Source and Target Characterisation, above).
Mapping From | Mapping To | Location | Version |
---|---|---|---|
CBV |
Schema.org |
https://github.com/SEMICeu/Semantic-Mappings/tree/main/Core%20Business |
CBV v2.2.0 – Schema.org v29.1 |
We then look at the intersection of CBV concepts and relationships with either of these. If so, we check if there is already a mapping from that element to Schema.org. This takes us to the Evaluate reusability step: and if it is an agreeable mapping between the two entities, we can reuse that mapping.
Alternatively, it may be the case of adaptive reuse, which involves refinements to better suit the mapping objectives.
-
Example: An existing alignment for LegalEntity to Organization was evaluated. However, it was missing relationships for organizational properties like schema:legalName and schema:taxID. So, the original alignment was extended to include new mappings for these properties. They may also be new alignments when existing resources are not suitable, which is the case for this tutorial.
-
Example: Add alignment to answer the question: What is the relation between legal:LegalEntity and schema:Organization?
-
Phase 4: Matching (Execute and Filter Matching Candidates)
At this step, we will perform the actual mapping, which we shall bootstrap by producing candidate mappings between classes and between properties, typically automatically,semi-automatically, or manually, and then assess the results.
Steps:
-
Select Matching Technique: Decide on a method for automatically or semi-automatically matching entities.
-
Perform Matching: Prepare the inputs and use the chosen tool to generate potential matches between CBV and Schema.org entities.
-
Candidate Evaluation: The knowledge engineer assesses the candidate correspondences for their consistency, accuracy, relevance, and alignment with the project’s requirements.
Procedure for CBV and Schema.org
In this tutorial, we use LIMES to automate link discovery in the mapping process between the CBV and Schema.org. While other tools could also be used, LIMES was selected for its simplicity and efficiency in performing lexical similarity-based alignments.
Set Up Data Sources
Preparing the data sources depends on the files and the alignment tool chosen. To determine this, the table of features compiled in Phase 2 is useful: there it lists whether alignment should be run on the class name or the label, the file format, and any other algorithmic peculiarities that may be asked for, such as a similarity threshold. For our use case with CBV and Schema.org and LIMES, we begin by configuring the SPARQL endpoints, which allow us to extract the relevant classes and properties for comparison. We focus on aligning entities by their rdfs:label whose value is the name or description of the entity, which is the most straightforward way to identify potential mappings.
Apply Matching Algorithm
LIMES uses a similarity metric to compare the rdfs:label values of entities from both files. This metric generates similarity scores based on the string matching of the labels and, optionally, their descriptions.
Analyse Results
Tools such as LIMES and Silk do not determine the semantic nature of the match (e.g., equivalence vs. subclass). They only suggest candidate pairs based on similarity metrics. It is up to the human expert to choose the appropriate relation, using knowledge of the domain and the ontology documentation.
Once the matching process is complete, we inspect the results. In the case of the CBV and Schema.org alignment, no matches were found with LIMES. This means that the automated tool did not identify any significant similarities between entities based on the chosen similarity metric. Consequently, we need to either try with another alignment tool or manually review and align the entities.
While we opt for the latter, let us first illustrate how the output would look if there had been potential matches. For instance, the Core Public Service Vocabulary (CPSV) as source against Schema.org does yield interesting results. LIMES output includes three key columns:
-
Source entity: a URI from the source ontology (e.g., CPSV).
-
Target entity: a URI from the target ontology (e.g., Schema.org).
-
Similarity score: a numerical value (typically from 0 to 1) indicating the strength of the lexical similarity between the two entities.
Manual Alignment Process
Even though the automated tool did not produce any alignments, we can use our domain knowledge to suggest potential mappings based on the descriptions and attributes of the entities. For example, CBV’s LegalEntity (that is an org:Organization) maps to Schema.org’s Organization based on their similar roles in representing business-related concepts, and likewise for CBV’s Address (imported from Core Location Vocabulary) with Schema.org PostalAddress. These kinds of alignments are made by examining the entity definitions and considering the context of their use in each ontology.
Phase 5: Validate Alignments
After alignments (i.e., candidate mappings) have been generated—whether by automated tools or through manual assessment—each proposed mapping must be validated. This process aims at checking whether the candidate links represent semantically meaningful relationships between classes or properties from the two ontologies.
Steps
-
Confirm candidate correspondences with domain experts.
-
Review proposed alignments
-
Decide on the appropriate semantic relation. Common types that we focus on here include:
-
Equivalence: When two entities are conceptually and functionally the same → rendered as owl:equivalentClass or owl:equivalentProperty
-
Subsumption: When one entity is more specific than the other (i.e., subclass or subproperty) → rendered as rdfs:subClassOf or rdfs:subPropertyOf
-
-
Formalise the alignment into a mapping following conventions from the Alignment format or EDOAL.
-
-
Render Mappings in a Machine-Readable Format. The mapping file can be rendered in RDF/XML, Turtle, or other formats like JSON-LD, depending on the tool or system in use. The choice of format should align with the needs of the stakeholders and the technical requirements of the project. While RDF/XML is a standard format for machine-readable ontology representations, it may not be ideal for human consumption due to its complexity. However, RDF/XML (or other syntaxes such as Turtle) is often used in formal contexts for consistency and integration with other semantic web tools. If ease of use for human review is desired, a Graphical User Interface (GUI) or tools that visualize RDF data can provide a more intuitive way to view and edit mappings.
Procedure for CBV and Schema.org
Review Proposed Alignments
Each candidate correspondence is checked for correctness and relevance. For CBV and Schema.org, we use a manual inspection by ontology engineers first , which includes cross-referencing documentation or definitions. CBV’s LegalEntity is “A self-employed person, company, or organization that has legal rights and obligations.” and Schema.org’s Organization is “An organization such as a school, NGO, corporation, club, etc.”. For Address, the descriptions are as follows: “A spatial object that in a human-readable way identifies a fixed location.” with a usage note indicating it to be understood as a postal address, and “The mailing address.”, respectively.
Decide the Appropriate Semantic Relation
For each validated candidate pair, determine the type of semantic relation.
For our running example, CBV’s LegalEntity is almost the same as Schema.org’s Organization, but the latter does not have the “legal rights” constraint, and therefore, the appropriate semantic relation is that of subsumption. For the respective addresses, while CBV’s imported Address’ definition is broader than PostalAddress, and therefore suggesting a subsumption alignment as well, taking into account CBV’s usage note, it can be an equivalence alignment.
Formalise the Alignment
Once the relation is chosen, each alignment is encoded as a machine-readable RDF triple, typically in RDF/XML or Turtle format, suitable for integration and reuse.
The result is a validated alignment file, where each mapping is represented based on the Alignment Format or an extension thereof, such as EDOAL, as an align:Cell with:
-
The aligned entities (align:entity1 and align:entity2)
-
The chosen relation (align:relation), being either subsumption “<” or equivalence “=”
-
An optional confidence measure
-
Its corresponding meaning in the ontology (owl:annotatedProperty)
-
Further information on a mapping justification, which reuses the Simple Standard for Sharing Ontological Mapping SSSOM that, in turn, reuses the Semantic Mapping Vocabulary SEMAPV for the mapping justification values. Example output of adding an alignment between legal:LegalEntity and schema:Organization:
http://mapping.semic.eu/business/sdo/cell/21 a align:Cell;
align:entity1 http://www.w3.org/ns/legal#LegalEntity;
align:entity2 https://schema.org/Organization;
align:relation "<";
align:measure "1"^^xsd:float;
owl:annotatedProperty rdfs:subClassOf;
sssom:mapping_justification semapv:MappingReview .
where it can be seen that the “<” relation corresponds to rdfs:subClassOf and the mapping justification is MappingReview, which is as approved as it can be in SEMAPV terminology.
Other alignment examples and a complete file including prefixes can be viewed here.
Phase 6: Application (Operationalise the Mappings)
After alignments have been validated, the final step is to apply them in practice. This involves both technical integration and the establishment of a governance framework to ensure the mappings remain up to date and useful over time.
Steps:
-
Publish the Mappings: Share the mappings in a standard format via a repository.
-
Integrate and Test: Apply the mappingsalignment in semantic web tools or data integration workflows.
-
Establish Governance: Define a process for updating the mappings in response to changes in the source or target ontologies.
Procedure for CBV and Schema.org
The mapping file (in RDF/Turtle format) resulting from our mapping exercise is published on the SEMIC GitHub repository. This allows for integration of the mapping output into validation tools used by Member States and other implementers to check CBV-compliant data against Schema.org requirements. For testing, one could attempt to load the files in an editor that can read in file of the chosen format, being Turtle for CBV, the Turtle version of Schema.org, and the alignment file, or run it trough a syntax validator to verify that all is in order for deployment.
The governance structure for CBV and its mapping to Schema.org is on the side of CBV that reuses Schema.org. SEMIC is responsible for regularly checking if there are updates in either the CBV vocabulary or Schema.org (e.g., new classes, renamed terms, deprecated elements) and update where needed, see to it that logged issues are addressed, and verify that all links and namespaces still in working order. It may also consider updates to CBV due to changing requirements (e.g., to make CBV multilingual) and plan any consequent updates to CBV and any potential effects it may have on the mappings with Schema.org.
Real Example Using SPARQL Anything
To demonstrate the practical value of the mapping between CBV and Schema.org, we use SPARQL Anything to query instance data alongside the alignment file.
Scenario
We assume:
-
A data file (core-business-ap.ttl) contains an instance of legal:LegalEntity
-
An alignment file (Alignment-CoreBusiness-2.2.0-schemaorg.ttl) defines a mapping from legal:LegalEntity to schema:Organization.
SPARQL Query
PREFIX xyz: http://sparql.xyz/facade-x/data/
PREFIX legal: http://www.w3.org/ns/legal#
PREFIX align: http://knowledgeweb.semanticweb.org/heterogeneity/alignment#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
CONSTRUCT {
?s a ?entity2 .
}
WHERE {
SERVICE x-sparql-anyhintg:app/core-business-ap.ttl {
?s a ?entity1 .
}
SERVICE x-sparql-anything:app/Alignment-CoreBusiness-2.2.0-schemaorg.ttl {
?align a align:Cell ;
align:entity1 ?entity1 ;
align:entity2 ?entity2 .
FILTER (?entity1 = legal:LegalEntity)
}
}