Definition

The layered structure of the Semantic Web (see http://www.w3.org/2007/03/layerCake.png) adopted by the World Wide Web Consortium W3C includes, among others, the Ontology layer with the web ontology language OWL and the rule layer with the emerging Rule Interchange Format (RIF) http://www.w3.org/TR/rif-fld/ which allows rules to be translated between rule languages. The integration of rules and ontologies aims at developing techniques for interoperability between rules and ontologies in the Semantic Web. This is necessary for rule-based applications to access existing domain ontologies. In most of the proposals the integration is achieved by defining and implementing a new language which is a common extension of a given rule language and a given ontology language, enhancing the expressive power of each of the components. Alternatively, the integration of rules and ontologies may be achieved by designing from scratch one language sufficiently expressive to define both rules and ontologies as well as their combinations.

Historical Background

In the initial phase of the Semantic Web research a significant effort was devoted to defining a language for ontology modeling. In 2004, it resulted in OWL, a family of three ontology languages OWL Lite, OWL DL, and OWL Full, based on Description Logics (DL). Each of them is a subset of the next one. The first two are syntactic variants of expressive description logics with semantics given by translation to formulae of first-order logic with equality. OWL DL (hence also its subsets) is supported by several reasoners. The original intention was to define OWL as a layer on top of RDF Schema which itself can be seen as ontology language. OWL Full, designed to achieve this goal, has a non-standard semantics and is difficult to implement. On the other hand, OWL DL includes a substantial subset of RDF Schema, which is extensively used in ontology definitions.

The importance of rules for web applications is reflected by the rule layer in the Semantic Web architecture. The rules formalisms considered for this layer offer modeling primitives not expressible in OWL. Their integration with OWL would thus enhance the expressive power of the latter.

In contrast to the ontology layer, no standard has been proposed yet for the rule layer. The rule languages proposed for the Semantic Web originate mainly from logic programming (see e.g., [18]). In contrast to OWL DL, they usually adopt the closed world assumption. This means that if a fact cannot be derived from a knowledge base it is concluded to be false. In logic programming, the closed world assumption is implemented by the negation-as-failure rule which returns ¬p on failure to prove a fact p. This is an example of non-monotonic reasoning, not allowed in FOL.

Designing rules languages for the Semantic Web is among the objectives of the RuleML initiative (http://www.ruleml.org/). The W3C RIF Working Group is developing a core rule language as a basis for rule interchange (for more details see http://www.w3.org/2005/rules/wiki/RIF_Working_Group).

Foundations

Shortcomings of OWL

The following examples show why OWL is not sufficient for some applications and motivate its extensions. OWL Ontologies include classes (e.g., Person, Woman, Man) which are unary predicates and properties (e.g., ParentOf, SisterOf) which are binary predicates, but predicates of arity larger than two are not allowed. In OWL it is not possible to formalize the statement an aunt of a person is a sister of a parent of that person. Also, the semantics of OWL does not allow to conclude that Mary is not a sister of John if the assertion SisterOf(john,mary) is not a logical consequence of the ontology. This kind of reasoning based on closed world assumption is useful in some applications. Most of the rule languages proposed for the Semantic Web do not share these shortcomings.

Rule Languages for Integration

The rule languages considered in integration proposals are usually extensions of Datalog. Generally rules have a form of “if” statements, where the predecessor, called the body of the rule, is a Boolean condition and the successor, called the head, specifies a conclusion to be drawn if the condition is satisfied.

In Datalog the condition of a rule is a conjunction of zero or more atomic formulae of the form p(t1, …, tn) where p is an n-ary predicate symbol and t1, …, tn are constant symbols or variables. Hence they are a restricted kind of FOL terms (see First order logic: syntax).

The head of a rule is an atomic formula (atom). For example, the rule

$$auntOf(X,Y)\leftarrow parentOf(Z,Y),\\ sisterOf(X,Z) $$

states that X is an aunt of Y if Z is a parent of Y and X is this parent’s sister. The semantics of Datalog associates with every set of rules (rulebase) its least Herbrand model (see e.g., [18]), where each ground (i.e., variable-free) atom is associated with a truth value true or false. The least Herbrand model is represented as the set of all atoms assigned to true. These are all the ground atoms which follow from the rules interpreted as implications in FOL. For example, the least Herbrand model of the rulebase consisting of the rule above and of the facts parentOf(tom, john), sister Of(mary, tom) includes the formula auntOf(mary, john). On the other hand, auntOf(mary, tom) does not follow in this rulebase. Hence the closed world assumption, used in Datalog, will result in the conclusion ¬auntOf(mary, tom). Datalog rulebases constitute a subclass of logic programs. The latter use FOL terms, not necessarily restricted to constants and variables. Proposals for the integration of rules and ontologies are mostly based on the following extensions of Datalog (which apply also to logic programs):

  • Datalog with negation, where the body may additionally include negation-as-failure (NAF) literals of the form not a where a is an atom. Intuitively a NAF literal not a is considered true if it does not follow from the program that a is true. For example, happy(john) can be concluded from the rulebase:

    • happy(X) ← healthy(X), not hungry(X)

    • healthy(john) ←

    Two commonly accepted formalizations of this intuition are: the well-founded semantics and the stable model semantics (see the survey [2]). The well-founded semantics [22] associates with a rulebase a unique (three-valued) Herbrand model, where each ground atom is assigned one of three logical values true, false or unknown. The stable model semantics [9] (called also the answer set semantics) associates with each rulebase some (possibly zero) two-valued Herbrand models. For a large class of programs relevant in practice (so called stratified programs, see e.g., [2]) both semantics coincide.

  • Extended Datalog. This extension (see e.g., extended logic programs in [2]) makes it possible to state explicitly negative knowledge. This is achieved by allowing negative literals of the form ¬p, where ¬ is called the strong negation connective, in the heads of rules as well as in the bodies. For example, the rule

    • ¬healthy(X) ← hasFever(X)

    allows to draw an explicit negative conclusion. In addition, NAF literals are also allowed in the bodies.

  • Rulebases with priorities. Datalog rulebases employing strong negation may be inconsistent, i.e., may allow to draw contradictory conclusions. For example, the rules

    • fly(X) ← bird (X)

    • bird(Y) ← penguin(Y)

    • fly(X) ← penguin(X)

    • penguin(tweety) ←

    allow to conclude fly(tweety) and ⌐fly(tweety). In Defeasible Logic [19] and in Courteous Logic Programs [10] a priority relation on rules can be specified for a rulebase. The contradictions in the derived conclusions are then resolved by means of the defined priorities.

  • Disjunctive Datalog [5] (see also disjunctive logic programs in [2]) admits disjunction of atoms in the rule heads, and conjunction of atoms and NAF literals in the bodies, e.g.,

    • male(X) ∨ female(X) ← person(X)

    A commonly used semantics of Disjunctive Datalog rulebases is an extension of the answer set semantics.

The rule languages are supported by implementations which make it possible to query and/or to construct the models of rulebases.

Approaches to Integration

The integration of a given rule language with a given ontology language is usually achieved by defining a common extension of both, to be called the integrated language. Alternatively, one can adopt an existing knowledge representation language expressive enough to represent rules and ontologies. As OWL is a standard ontology language the ontology languages considered in integration proposals are usually its subsets. The approaches can be classified by the degree of the integration of rules and ontologies achieved in the integrated language.

  • Homogeneous Integration. The integrated language makes no distinction between the rule predicates and the ontology predicates. It includes the original rule language and the original ontology language as sublanguages. The integration is to be faithful in the sense that the sublanguages should have the same semantics as the respective original languages. The homogeneous integration is difficult to achieve since ontology languages are usually based on FOL and rule languages have different kind of semantics. Examples of the homogeneous integration include:

    • DLP (Description Logic Programs) [12], which is a language obtained by intersection of a Description Logic with Datalog rules interpreted as FOL implications. DLP has a limited expressive power, but a DLP ontology can be compiled into rules and easily integrated into a rulebase of a more expressive rule language. For example Sweet Ruleshttp://sweetrules.projects.semwebcentral.org/ combine DLP and Datalog with strong negation and priorities. The technique of compiling ontologies to rules is also used in DR-Prolog [1] based on Defeasible Logic.

    • F-logic, extending classical predicate calculus with the concepts of objects, classes, and types. It is expressive enough to represent ontologies, rules and their combinations [13].

    • SWRL (Semantic Web Rule Language)http://www.w3.org/Submission/SWRL/, extending OWL DL with rules interpreted as FOL implications. Thus, SWRL is based on FOL and does not offer non-monotonic features, such as negation-as-failure. A so-called DL-safe subset [17] of SWRL is supported by KAON2 system http://kaon2.semanticweb.org which also offers a support for a restricted subset of F-logic.

    • Hybrid MKNF Knowledge Bases [16], taking a modal logic as a basis of faithful integration of Description Logic with Disjunctive Datalog under the answer set semantics. A variant of this approach considering nondisjunctive hybrid MKNF knowledge bases under well-founded semantics is presented in [14].

    • Quantified Equilibrium Logic, considered in [3] as a unified framework which embraces classical logic as well as disjunctive logic programs, thus providing a foundation for the integration of rules and ontologies.

  • Heterogeneous Integration. In this approach, the distinction between the rule predicates and the ontology predicates is preserved in the integrated language. The integration of rules and ontologies is achieved by allowing the ontology predicates in the rules of the integrated language. Assume, for example, that an ontology classifies courses as project courses and lecture courses.

$$ Project\bigsqcup Lecture= Course $$

It also includes assertions like Lecture(cs05), Project(cs21) or Course(cs32) (e.g., for courses including lectures and projects). The assertions indicate offered courses. A person is considered a student if he/she is enrolled in an offered lecture or project. This can be expressed by the following rules, using the ontology predicates

  • student(X) ← enrolled(X,Y), Lecture(Y)

  • student(X) ← enrolled(X,Y), Project(Y)

In addition the rulebase includes enrollment facts, e.g., enrolled(joe, cs32). The extended language allows thus to define ontologies using the constructs of the ontology language and the rulebases with rules referring to the ontologies. An extended rulebase together with an ontology is called a hybrid knowledge base. In the heterogeneous approaches implementations are often based on hybrid reasoning principle, where a reasoner of the ontology language is interfaced with a reasoner of the rule language to reason in the integrated language.

Two kinds of heterogeneous approaches can be distinguished:

  1. 1.

    Loose coupling. In this approach the semantics of hybrid knowledge bases is based on a transformation which eliminates ontology queries from ground extended rules by querying the underlying ontology. A ground set of extended rules is thus reduced to a set of rules without ontology predicates in the following way. If the answer to a ground ontology query in a rule body is positive, the query is removed from the rule, otherwise the rule is removed from the set. The loose coupling approach applied to the example above does not allow to conclude that Joe is a student. This is because neither Lecture(cs32) nor Project(cs32) can be derived from the ontology. Examples of loose coupling include:

    • dl-programs [6], combining (disjunctive) Datalog with negation under the answer set semantics with OWL DL. So called DL-queries, querying the ontology, are allowed in rule bodies. They may also refer to a variant of the ontology, where the set of its assertions is modified by the DL-query. This enables bi-directional flow of information between rules and ontologies. A variant of the language based on the well-founded semantics is presented in [7].

    • TRIPLE [21], a rule language with the syntax inspired by F-logic. It admits queries to the ontology in rule bodies.

    • SWI Prologhttp://www.swi-prolog.org/, a logic programming system with a Semantic Web library which makes it possible to invoke RDF Schema and OWL reasoners from Prolog programs.

  2. 2.

    Tight integration. In this approach the semantics of hybrid knowledge bases is defined by combining the model-theoretic semantics of the original rule language with the FOL semantics of the ontology language. For example, tight integration of Datalog (without negation) with a Description Logic can be achieved within FOL by interpreting Datalog rules as FOL implications. In this semantics student(joe) is a logical consequence of the example hybrid knowledge base. As Course(cs32) is an assertion of the ontology, it follows by the axiom ProjectLecture = Course that in any FOL model of the ontology Project(cs32) or Lecture(cs32) is true. As enrolled(joe, cs32) is true in every model, so the premises of at least one of the implications

  • student(joe) ← enrolled(joe, cs32),Lecture(cs32)

  • student(joe) ← enrolled(joe, cs32),Project(cs32)

    must be true in any model. Hence student(joe) is concluded. Examples of tight integration include:

    • CARIN [15], a classical work on integrating Datalog with a family of Description Logics under the FOL semantics.

    • DL + log [20], integrating Disjunctive Datalog under the answer set semantics with OWL DL. For each FOL model of the ontology the rules of the knowledge base are reduced to rules of Disjunctive Datalog, with stable models defined by the answer set semantics.

    • Hybrid Rules [4], integrating logic programs under well-founded semantics with OWL. For each FOL model of the ontology the rules of the knowledge base are reduced to a logic program with the model defined by the well-founded semantics.

The theoretical foundations developed by studying integration of ontologies with variants of Datalog provide a basis for further extensions. This includes dealing with uncertain and inconsistent knowledge, and using integrated Datalog-based languages as condition languages for ECA-Rules.

Key Applications

The integration of rules and ontologies is a relatively new research topic, focused so far on developing tools and prototypes. Key applications include semantic data integration, ontology-based web search and semantic recommendation systems. Industrial applications of this kind are discussed in the video lecture [8]. The Ontobroker system referred therein is based on F-logic. Another field of potential key applications is e-business as discussed in the tutorial video [11], with focus on Sweet Rules.

URL to Code

The following systems integrating rules and ontologies can be downloaded:

A prototype implementation of dl-programs (NLP-DL) can be accessed at http://con.fusion.at/nlpdl/.

Cross-References