UC1.1: Tutorial on creating an XSD schema for Core Business Vocabulary (CBV)

Creating an XSD schema for the Core Vocabulary involves defining the structure, data types, and relationships for the elements of the CBV, ensuring interoperability between systems. This tutorial follows the methodology outlined for Use Case UC1.1: "Create a New XSD Schema," showing how to design and create an XSD schema that integrates terms from the Core Business Vocabulary (CBV). This step-by-step guide focuses on the essential stages of the schema creation process, ensuring that the elements from CBV are correctly imported, the document structure is shaped, and all constraints are applied.
To recap the process, we first will import or define elements, shape the structure with patterns, define complex types, and finalise the schema.

Phase 1:Import or Define Elements

Managing Imports and Namespaces

In XML schema development, managing imports and namespaces is crucial to ensure that elements from external vocabularies are reused and integrated consistently. This step ensures that the schema obtains semantics, will be reusable, and is correctly aligned with the Core Business Vocabulary (CBV).
For example, CBV comes with its own XSD schema, the following import statement imports all definitions related to CBV elements. The following example shows how to import the CBV into your XSD schema, which is explained afterward:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://data.europa.eu/m8g/xsd"
    xmlns="http://data.europa.eu/m8g/xsd"
    xmlns:dct="http://purl.org/dc/terms/"
    xmlns:sawsdl="http://www.w3.org/ns/sawsdl"
    elementFormDefault="qualified"
    attributeFormDefault="unqualified"
    version="2.2.0">
 <!-- Importing Core Business Vocabulary schema -->
<xs:import namespace="http://data.europa.eu/m8g/" schemaLocation="https://github.com/SEMICeu/XML-schema/blob/main/models/CoreVoc_Business/CoreVoc_Business.xsd"/>
</xs:schema>

<xs:import>: The element that imports the CBV schema to make its terms available in your schema.
namespace="http://data.europa.eu/m8g/": Defines the namespace of the CBV.
schemaLocation="https://raw.githubusercontent.com/SEMICeu/XML-schema/main/models/CoreVoc_Business/CoreVoc_Business.xsd"": Points to the location of the CBV schema file on the Web.

Define elements

If no XSD schema is provided by the Core Vocabulary, you must define these terms manually within your own schema. These new elements need to adhere to the Core Vocabulary’s namespace to maintain consistency.
For example, the LegalEntity element could be defined as follows if no XSD is provided for it:

<xs:element name="LegalEntity" type="LegalEntityType"/>

Make sure you declare the correct namespace (e.g., http://example.com/) for all these custom elements.

Phase 2: Shape XML Document Structure with Patterns

At this stage, we focus on structuring the XML document using appropriate XML Schema Design Patterns. The Venetian Blind and Garden of Eden patterns are two methods for organizing the schema.

Venetian Blind Pattern

In the Venetian Blind pattern, there is one primary global element, and all other elements are nested inside it. This approach is ideal when a central entity, like LegalEntity, serves as the entry point, as seen in CBV.
This pattern fits well with API design, where you typically request information about a central concept (such as LegalEntity), and the response includes nested elements like LegalName and RegisteredAddress, all organised under the main entity.

Here’s an example, where LegalEntity serves as the main entry point:

<xs:schema
targetNamespace="http://data.europa.eu/m8g/xsd"
xmlns="http://data.europa.eu/m8g/xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:dct="http://purl.org/dc/terms/"
xmlns:sawsdl="http://www.w3.org/ns/sawsdl" >
    <xs:element name="LegalEntity" type="LegalEntityType"/>
    <xs:element name="LegalName" type="TextType"/>
    <xs:element name="RegisteredAddress" type="AddressType"/>
            <!-- Other elements -->
</xs:schema>

In this example:

  • LegalEntity is the global entry point.

  • It uses LegalEntityType, which contains various properties such as LegalName and RegisteredAddress.

Garden of Eden Pattern

In the Garden of Eden pattern, there are multiple entry points in the XML document. This is more flexible and is suitable when no central class is inherently the main starting point.. Core Business Vocabularies (CBVs) often use this pattern. While LegalEntity may be central, the pattern allows reusing properties even if specific classes aren’t directly used. For example, you could start with a class like Identifier instead of LegalEntity.
For instance, both LegalEntity and Organization can serve as root elements in the XML schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="LegalEntity" type="LegalEntityType"/>
<xs:element name="Organization" type="OrganizationType"/>
</xs:schema>

Define Complex Types

After importing or defining the basic elements and structuring your XML document with patterns, the next step in creating an XSD schema is to define complex types. Complex types are used to represent business entities that contain multiple properties or relationships. For CBV, these types often model entities like LegalEntity or Organization, which have both simple and complex elements. For example, the LegalEntityType and OrganizationType, as follows.

A LegalEntity might contain multiple child elements, such as LegalName modelled as a simple string, RegisteredAddress (also a complex type), and other related elements. Here’s how LegalEntityType is defined:

<xs:complexType name="LegalEntityType" sawsdl:modelReference="http://www.w3.org/ns/legal#LegalEntity">
    <xs:sequence>
                    <xs:element ref="LegalName" minOccurs="0" maxOccurs="unbounded" sawsdl:modelReference="http://www.w3.org/ns/legal#legalName"/>
                           <xs:element ref="RegisteredAddress" minOccurs="0" maxOccurs="unbounded" sawsdl:modelReference="http://data.europa.eu/m8g/registeredAddress"/>
        <!-- More elements as needed -->
    </xs:sequence>
</xs:complexType>

Note: The sawsdl:modelReference annotation is used to link the element to an external concept, providing semantic context by associating the element with a specific vocabulary or ontology.

Similar to the LegalEntityType complex type, the OrganizationType defines a business entity with multiple properties and relationships. However, for Organization, we define it as a complex type that contains hierarchical relationships, such as HeadOf and MemberOf.

<xs:complexType name="OrganizationType">
    <xs:sequence>
       <xs:element ref="HeadOf" minOccurs="0" maxOccurs="unbounded" sawsdl:modelReference="http://www.w3.org/ns/org#headOf"/>
       <xs:element ref="MemberOf" minOccurs="0" maxOccurs="unbounded" sawsdl:modelReference="http://www.w3.org/ns/org#memberOf"/>
        <!-- Other properties -->
    </xs:sequence>
</xs:complexType>

This allows Organization to contain members and potentially sub-organizations, thus creating a flexible representation of organizational structures.
It’s important to note that in this context, LegalEntityType is defined as an extension of FormalOrganizationType, which, in turn, extends OrganizationType.

<!-- LegalEntityType -->
    <xs:element name="LegalEntity" type="LegalEntityType"/>
       <xs:complexType name="LegalEntityType" sawsdl:modelReference="http://www.w3.org/ns/legal#LegalEntity">
            <xs:complexContent>
            <xs:extension base="FormalOrganizationType"/>
</xs:complexContent>
</xs:complexType>

Finalising the XSD Schema

Adding annotations and documentation to each complex type and element helps to clarify their purpose and improve the readability of the schema. For instance:

<xs:annotation>
   <xs:documentation xml:lang="en">
     A self-employed person, company, or organization that has legal rights and obligations.
   </xs:documentation>
</xs:annotation>

Phase 3: Validation and Best Practices

Finally, test your new schema by validating sample XML documents using XML validation tools (e.g.: XMLValidation tool) to ensure that the schema is syntactically correct and works as expected. The Core Business Vocabularies (CBV) follow several best practices and validation rules to maintain consistency, clarity, and reusability across schemas. These rules include naming conventions, documentation standards, and structural rules.

Schematron Validation Rules

To ensure schema compliance, the Schematron rules provide automated checks. These rules cover key aspects such as type definitions, element declarations, metadata, and more. The detailed list of rules can be found here.

Running the Validation

You can execute the rules using the provided build.xml file, which leverages Apache Ant. The process validates the schema against the Schematron rules and generates HTML reports for easy inspection.