Practical Ontology Modeling - Three C’s of ontology engineering
Applying traditional architectural design principles to ontology modeling
This is a brief guide to pragmatic ontology design. It is borne out of practical experience of designing ontologies for real world semantic publishing applications. In it I borrow good design principles from traditional software architecture techniques and apply them to ontology design. It discusses the concepts that I will call the 3 C’s of ontology engineering.
C1. Contract binding
Traditional (yet contemporary) software and system design principles typically mandate that components and services should be bound by contract and be loosely coupled. For example, the integration of a third party component into an organisation’s existing architecture might be coupled through a bespoke web service that defines an API or interface to the underlying technology, thus creating an abstraction layer between the organisation's own services and the third party tech. This abstraction interface allows the organisation to switch out the component for an alternative (in the future) without having to enforce the clients of the service to change, as the clients are bound to the contract of the web-service abstraction layer instead of the underlying component.
So what does this have to do with ontology modelling? The utility the semantic web brings to the web application banquet is the richness of the ontology models underlying the instance data being represented. Moreover, the use of existing public domain ontologies in one’s own ontology allows us to publish RDF such that external consumers (machines or humans) will already know how to interpret and use the knowledge described therein.
Today’s ontology modeller has a collection of widely used public domain ontologies in his arsenal (FOAF, Dublin Core, Event, Time etc). In many cases the data architect can model a large part of his chosen domain using these public domain ontologies as building blocks and little else. Often an organisation will want to construct APIs on-top of their domain model (ontology) in order to provide some bespoke service, returning RDF to their systems to consume and process. However if the Acme Company data architect models his domain using purely public domain ontologies he is enforcing an early and tightly bound contract of his internal systems (APIs and their underlying SPARQL queries) to them. If he models people and relationships using pure FOAF then Acme’s internal systems are effectively bound to a FOAF contract. If in the future FOAF goes out of fashion and becomes less widely used, and a new public domain ontology replaces it as the gold standard, then it may be a costly exercise to remodel and rebuild Acme’s internal systems to use the new ontology, as the contract binding is FOAF based.
If the Acme data architect defines his own ontology classes and properties to represent People and relationships, while inheriting from FOAF (e.g. acme:Person rdfs:subClassOf foaf:Person), and then engineers internal systems using the Acme ontology, then Acme’s internal systems and APIs become less tightly bound to the public domain ontology and the interface contract is with Acme’s proprietary ontology. This offers Acme a number of advantages -
1. Acme can continue to publish RDF described using widely used public domain ontologies so third party (external) consumers *know* how to consume their data, binding to the (currently in fashion) public domain ontology.
2. Acme’s internal systems are somewhat protected from future trends in ontology choice as their own APIs contracts are not tightly bound to any particular public domain ontology. Re-engineering to use a new public domain ontology is relatively cheap, effectively becoming a matter of redefining which classes and properties Acme’s ontology inherits from, leaving internal API interfaces and systems unchanged as they are described using the Acme ontology model.
3. Published RDF has a modicum of provenance as along with the public domain RDF statements, statements from Acme’s own ontology are also published giving the external consumer a decent clue as to its origin.
4. Extending the third party ontology (e.g. adding additional properties to a class) can be done on Acme’s ontology, leaving the public domain ontology in its natural state.
Obviously nothing comes for free :) - there are some downsides to consider:
1. The Acme Ontologist has more work to do during the modelling exercise
2. You are increasing the number of statements in your triple-store for each resource described, and thus there is an associated hit in performance / storage cost (it might be a small hit of course)
3. it is likely you will have to publish your own ontology publicly if you are publishing your RDF - although this is probably not such a bad thing.
While I have used FOAF in my examples above (mainly because everyone knows it), I am not suggesting anyone runs off and reinvents FOAF, Time, Event etc, as many of these lower-level ontologies are almost certainly here to stay (and certainly offer the Acme data architect some great design patterns, and excellent building blocks).
A better example (but less widely known) is with meta-tagging ontologies. There are a number of tagging ontologies in the public domain as I write this, none of them individually stand out as the current gold standard for tagging. So if the Acme data architect chooses the Holygoat ontology for associating his media assets with domain entities, it is quite possible that one of the other tagging ontologies emerges as ‘the one’ tagging ontology. By building an Acme tagging ontology inheriting from Holygoat, his own systems can bind to the Acme tagging contract, in the safe knowledge that when a gold standard for tagging does emerge it is simple and cheap to switch.
There is clearly a grey area in choosing at what level you should start modelling an abstraction layer in your ontology. This line should be drawn at the point at which the things you are modelling fall into your domain - i.e. the boundary (contract) at which your internal systems would need to bind with your APIs and RDF.
We have discussed above how an internal or domain ontology can typically be constructed using a number of low-level public domain ontologies as building blocks and design patterns. Often the entire domain under construction could be engineered using these building blocks alone. Indeed 95% of the Sport domain (See the sport ontology described in my earlier blog) could be represented just with these building blocks (Event, FOAF, Participation, DCterms). However this would require some imagination, would mean that ones interfaces bind early to the public domain ontologies (see C1), and probably require you to have more complex set of APIs for applications based on this domain. We are going to suggest a domain modelled in this way has a low degree of cohesion.
The sport ontology however, while inheriting and lending from these building blocks defines and describes the domain of sport uniquely and in detail, provides later binding to the contracts supplied by the public domain ontologies, and requires less imagination to understand how the ontology describes the domain. This is an ontology with a high degree of cohesion with respect to its domain.
In traditional Object Oriented software design, good practice typically dictates defining interfaces (eg in Java or .NET) for clients to bind to while allowing the developer to implement these interfaces using one or more techniques. The interface exposes only the required functionality to the other software components, hiding the implementation details. A software component can implement more than one interface (multiple inheritance in Java) taking on the behaviour of each interface.
One can think of an ontology class definition as an interface, exposing certain properties (or behaviour) of a domain model while hiding others in deeper graphs. A domain instance (an instance of a resource from some other domain) can take on the behaviour of a class (A) in another ontology through multiple inheritance, by declaring itself to be of rdf:type of the class (A).
Using the concept of meta-tagging, the following diagram shows how a domain modelled Thing can inherit behaviour from a Tag class so it can take part in tagging operations on a TaggableThing. Similarly, the Asset inherits behaviour from the TaggableThing so it can be tagged. This is analogous to a software (say Java) object implementing an interface, and is in effect a restricted, clean and clearly defined coupling between ontologies.
We have borrowed three design principles from traditional software architecture, applying them to ontology engineering. By thinking about Contract binding, Cohesion, and Coupling one can engineer ontologies for your semantic application that will play cleanly and nicely with each other, will integrate well with your existing systems, and provide binding contracts that give the flexibility to adapt to future ontology trends whilst protecting you from costly re-engineering exercises.