How to create new data models
Create a new XSD schema
This section provides detailed instructions for addressing use case UC1.1.
To create a new XSD schema, the following steps need to be observed:
-
Import or define elements
-
Shape structure with patterns
Import or define elements
When working with XML schemas, particularly in relation to semantic artefacts like ontologies or data shapes, managing the imports and namespaces are vital considerations that ensure clarity, reusability, and proper integration of various data models.
When a core vocabulary has defined an associated XSD schema, it is not only easy but also advisable to directly import this schema using the xsd:import statement. This enables seamless reuse and guarantees that any complex types or elements defined within the core vocabulary are integrated correctly and transparently within new schemas.
The imported elements are then employed in the definition of a specific document structure. For example, Core Vocabularies are based on DCTERMS [ref], that provides an XML schema, so Core Person could import the DCTERMS XML schema for the usage of AgentType:
In cases where the Core Vocabulary does not provide an XSD schema, it is necessary to create for the reused URIs the corresponding XML element definitions in the new XSD schema. Crucially, these new elements must adhere to the namespace defined by the Core Vocabulary to maintain consistency. For example “AgentType” must be defined within the “http://data.europa.eu/m8g/” namespace of the Core Vocabularies.
Furthermore, when integrating these elements into a new schema, it is essential to reflect the constraints from the core vocabulary’s data shape-specifically, which properties are optional and which are mandatory - within the XSD schema element definitions.
Shape XML document structure
In designing XML schemas, the selection of a design pattern has implications for the reusability and extension of the schema. The Venetian Blind and Garden of Eden patterns stand out as preferable for their ability to allow complex types to be reused by different elements [sem-map].
The Venetian Blind pattern is characterised by having a single global element that serves as the entry point for the XML document, from which all the elements can be reached. This pattern implies a certain directionality and starting point, analogous to choosing a primary class in an ontology that has direct relationships to other classes, and from which one can navigate to the rest of the classes.
For instance, in the Core Business Vocabulary, if one were to select the "Legal Entity" class as the starting point, it would shape the XML schema in such a way that all other classes could be reached from this entry point, reflecting its central role within the ontology. A possible implementation with Venetian Blind with “Legal Entity” as the root element would be:
Adopting Venetian Blind pattern reduces the variability in its application and deems the schema usable in specific scenarios by providing not only well-defined elements, but also a rigid and predictable structure.
On the other hand, the Garden of Eden pattern allows for multiple global elements, providing various entry points into the XML document. This pattern accommodates ontologies where no single class is inherently central, mirroring the flexibility of graph representations in ontologies that do not have a strict hierarchical starting point.
Adopting the Garden of Eden pattern provides a less constrained approach, enabling users to represent information starting from different elements that may hold significance in different contexts. This approach has been adopted by standardisation initiatives such as NIEM and UBL, which recommend such flexibility for broader applicability and ease of information representation.
However, the Garden of Eden pattern does not lead to a schema that can be used in final application scenarios, because it does not ensure a single stable document structure but leaves the possibility for variations. This schema pattern requires an additional composition specification. For example, if it is used in a SOAP API, the developers can decide on using multiple starting points to facilitate exchange of granular messages specific per API endpoint. This way the XSD schema remains reusable for different API endpoints and even API implementations.
Overall, the choice between these patterns should be informed by the intended use of the schema, the level of abstraction of the ontology it represents, and the needs of the end-users, aiming to strike a balance between structure and flexibility.
Recommendation: We consider the Garden of Eden pattern suitable for designing XSD schemas at the level of core or domain semantic data specifications, and the Venetian Blind pattern suitable for XSD schemas at the level of specific Application Profiles.
Create a new JSON-LD context definition
This section provides detailed instructions for addressing use case UC1.2.
JSON-LD combines the simplicity, power, and web ubiquity of JSON with the concepts of Linked Data. Creating JSON-LD context definitions facilitates this synergy. This ensures that when data is shared or integrated across systems, it maintains its meaning and can be understood in the same way across different contexts. Here’s a guide on how to create new JSON-LD contexts for existing CVs, using the Core Person Vocabulary as an example.
-
Import or define elements
-
Shape structure
Import or define elements
When a CV has defined an associated JSON-LD context, it is not only easy, but also advisable to directly import this context using the @import
keyword. This enables seamless reuse and guarantees that any complex types or elements defined within the vocabulary are integrated correctly and transparently within new schemas.
"@context": {"@import": "https://json-ld.org/contexts/remote-context.jsonld", }
In cases where the CV does not provide an JSON-LD context, it is necessary to create for the reused URIs the corresponding field element definitions. To start, gather all the terms from the Core Person Vocabulary that you want to include in your JSON-LD context. Terms can include properties like given name
, family name
, date of birth
, and relationships like residency
or contact point
.
Then, decide the desired structure of the JSON-LD file, by defining the corresponding keys, for example Person.givenName
, Person.familyName
, Person.dateOfBirth
, Person.residency
, Person.contactPoint
. These new fields must adhere to the naming defined by the CV to maintain consistency.
Finally, assign URIs to keys. Each term in your JSON-LD context must be associated with a URI from an ontology that defines its meaning in a globally unambiguous way. Associate the URIs established in CVs to JSON keys using the same CV terms. For example:
"Person.contactPoint": {"@id": "http://data.europa.eu/m8g/contactPoint"}.
The ones that are imported by the CVs, shall be used as originally defined, for example from FOAF:
"Person.givenName": {"@id": "http://xmlns.com/foaf/0.1/givenName"}.
Shape structure
Start defining the structure of the context by relating class terms with property terms and then, if necessary, property terms with other classes.
Commence by creating a JSON structure that starts with a @context
field. This field will contain mappings from your vocabulary terms to their respective URIs. Continue by defining fields for Classes and subfields for their properties.
If the JSON-LD context is developed with the aim of being used directly in exchange specific to an application scenario, then aim to establish a complete tree structure that starts with a single root class. To do so, specify precise @type
references linking to the specific Class. For example:
"Person.contactPoint" : {"@id": "http://data.europa.eu/m8g/contactPoint", "@type": "ContactPoint"}.
If the aim of the developed JSON-LD context is rather ensuring semantic correspondences, without any structural constraints, which is the case for core or domain semantic data specification, then definitions of structures specific to each entity type and its properties suffice, using only loose references to other objects. For example:
"Person.contactPoint": {"@id": "http://data.europa.eu/m8g/contactPoint", "@type": "@id"}