Skip to main content
Log in

Using ontology databases for scalable query answering, inconsistency detection, and data integration

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

An ontology database is a basic relational database management system that models an ontology plus its instances. To reason over the transitive closure of instances in the subsumption hierarchy, for example, an ontology database can either unfold views at query time or propagate assertions using triggers at load time. In this paper, we use existing benchmarks to evaluate our method—using triggers—and we demonstrate that by forward computing inferences, we not only improve query time, but the improvement appears to cost only more space (not time). However, we go on to show that the true penalties were simply opaque to the benchmark, i.e., the benchmark inadequately captures load-time costs. We have applied our methods to two case studies in biomedicine, using ontologies and data from genetics and neuroscience to illustrate two important applications: first, ontology databases answer ontology-based queries effectively; second, using triggers, ontology databases detect instance-based inconsistencies—something not possible using views. Finally, we demonstrate how to extend our methods to perform data integration across multiple, distributed ontology databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://www.w3.org/TR/owl-features/

  2. http://esw.w3.org/LargeTripleStores

  3. The notation {x/M,y/J} denotes that the variable x gets substituted with M, y with J, and so on, as part of the unification process.

  4. We denote a negative table in the database such as \(\neg\mathrm{Female}\) using an underscore prefix, e.g., _Female.

  5. Experiment performed on a 1.8 GHz Centrino laptop with 1GB RAM in 10/2007.

  6. We confirmed by looking at system logs that the short divergence at about 1.2 million facts in Fig. 5a was due to virus scanning, a background interference. The constant cost per assertion clearly resumes after virus scanning terminates.

  7. http://bioportal.bioontology.org/ontologies/virtual/1321

  8. http://www.berkeleybop.org/goose

  9. http://zfin.org/cgi-bin/webdriver?MIval=aa-markergoview.apg&OID=ZDB-GENE-030319-2

  10. http://zfin.org/cgi-bin/webdriver?MIval=aa-markergoview.apg&OID=ZDB-GENE-000616-1

  11. http://www.informatics.jax.org/javawi2/servlet/WIFetch?page=markerGO&key=33374

  12. http://zfin.org/cgi-bin/webdriver?MIval=aa-markergoview.apg&OID=ZDB-GENE-040718-342

  13. http://sourceforge.net/tracker/?func=detail&aid=2686444&group_id=36855&atid=469833

  14. Please read the symbol ⊢ as infers and the symbol \(\vDash\) as entails.

References

  • Abadi, D. J., Marcus, A., Madden, S. R., & Hollenbach, K. (2009). SW-Store: A vertically partitioned DBMS for Semantic Web data management. VLDB Journal, 18(2), 385–406.

    Article  Google Scholar 

  • Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., & Patel-Schneider, P. F. (Eds.) (2003). The description logic handbook: Theory, implementation, and applications. Cambridge University Press.

  • Baader, F., & Morawska, B. (2009). Unification in the description logic EL. In Rewriting Techniques and Applications.

  • Baader, F., & Nutt, W. (2003). Basic description logics. In Description logic handbook (pp. 43–95).

  • Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American.

  • Bernstein, P. A., & Rahm, E. (2000). Data warehouse scenarios for model management. In ER (pp. 1–15).

  • Bodenreider, O., Smith, B., Kumar, A., & Burgun, A. (2007). Investigating subsumption in snomed ct: An exploration into large description logic-based biomedical terminologies. Artificial Intelligence in Medicine, 39(3), 183–195.

    Article  Google Scholar 

  • Broekstra, J., Kampman, A., & van Harmelen, F. (2002). Sesame: A generic architecture for storing and querying RDF and RDF schema. In International Semantic Web conference (pp. 54–68).

  • Buchmann, A. P., Branding, H., Kudrass, T., & Zimmermann, J. (1992). Reach: A real-time, active and heterogeneous mediator system. IEEE Data Engineering Bulletin, 15(1–4), 44–47.

    Google Scholar 

  • Bult, C. J., Eppig, J. T., Kadin, J. A., Richardson, J. E., & Blake, J. A. A. (2008). The Mouse Genome Database (MGD): Mouse biology and model systems. Nucleic Acids Research, 36(Database issue), D724–D728.

    Google Scholar 

  • Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., & Rosati, R. (2005). DL-Lite: Tractable description logics for ontologies. In AAAI ’05: Proceedings of the 20th national conference on artificial intelligence (pp. 602–607).

  • Ceri, S., Fraternali, P., Paraboschi, S., & Tanca, L. (1992). Constraint enforcement through production rules: Putting active databases at work. IEEE Data Engineering Bulletin, 15(1–4), 10–14.

    Google Scholar 

  • Ceri, S., & Widom, J. (1993). Managing semantic heterogeneity with production rules and persistent queues. In VLDB (pp. 108–119).

  • Chakravarthy, S., Hanson, E. N., & Su, S. Y. W. (1992). Active data/knowledge bases research at the University of Florida. IEEE Data Engineering Bulletin, 15(1–4), 35–39.

    Google Scholar 

  • Christophides, V., Karvounarakis, G., Plexousakis, D., Scholl, M., & Tourtounis, S. (2004). Optimizing taxonomic Semantic Web queries using labeling schemes. Journal of Web Sematics, 1, 207–228 (Elsevier).

    Article  Google Scholar 

  • Clark, K. L. (1977). Negation as failure. In Logic and data bases (pp. 293–322).

  • Copeland, G. P., & Khoshafian, S. N. (1985). A decomposition storage model. In SIGMOD ’85: Proceedings of the ACM SIGMOD international conference on management of data (pp. 268–279). New York: ACM.

    Chapter  Google Scholar 

  • Curé, O., & Squelbut, R. (2005). A database trigger strategy to maintain knowledge bases developed via data migration. In EPIA ’05: Proceedings of the 12th Portuguese conference on artificial intelligence (pp. 206–217).

  • Dietrich, S. W., Urban, S. D., Harrison, J. V., & Karadimce, A. P. (1992). A dood ranch at ASU: Integrating active, deductive and object-oriented databases. IEEE Data Engineering Bulletin, 15(1–4), 40–43.

    Google Scholar 

  • Donini, F. M., Nardi, D., & Rosati, R. (2002). Description logics of minimal knowledge and negation as failure. ACM Transactions on Computational Logic, 3(2), 177–225.

    Article  MathSciNet  Google Scholar 

  • Dou, D., Frishkoff, G., Rong, J., Frank, R., Malony, A., & Tucker, D. (2007). Development of NeuroElectroMagnetic Ontologies (NEMO): A framework for mining brainwave ontologies. In Proceedings of the 13th ACM international conference on knowledge discovery and data mining (KDD) (pp. 270–279).

  • Dou, D., & LePendu, P. (2006). Ontology-based integration for relational databases. In ACM symposium on applied computing (SAC) (pp. 461–466).

  • Dou, D., LePendu, P., Kim, S., & Qi, P. (2006a). Integrating databases into the Semantic Web through an ontology-based framework. In International workshop on Semantic Web and databases (SWDB) (p. 54). Co-located with ICDE 2006.

  • Dou, D., McDermott, D. V., & Qi, P. (2005). Ontology translation on the Semantic Web. Journal of Data Semantics, 2, 35–57.

    Article  Google Scholar 

  • Dou, D., Pan, J. Z., Qin, H., & LePendu, P. (2006b). Towards populating and querying the Semantic Web. In International workshop on scalable Semantic Web knowledge base systems (SSWS) (pp. 129–142). Co-located with ISWC 2006.

  • Frishkoff, G., LePendu, P., Frank, R., Liu, H., & Dou, D. (2009). Development of Neural Electromagnetic Ontologies (NEMO): Ontology-based tools for representation and integration of event-related brain potentials. In ICBO ’09: Proceedings of the international conference on biomedical ontology (pp. 31–34).

  • Frishkoff, G. A. (2007). Hemispheric differences in strong versus weak semantic priming: Evidence from event-related brain potentials. Brain and Language, 100(1), 23–43.

    Article  Google Scholar 

  • Gallaire, H., Minker, J., & Nicolas, J.-M. (1977). Logic and data bases. New York, NY, USA: Association for Computing Machinery.

    Google Scholar 

  • Gallaire, H., & Nicolas, J.-M. (1990). Logic and databases: An assessment. In ICDT (pp. 177–186).

  • Gene Ontology Consortium (2000). Gene Ontology: Tool for the unification of biology. Nature Genetics, 25, 25–29.

    Article  Google Scholar 

  • Gene Ontology Consortium (2006). The Gene Ontology (GO) project in 2006. Nucleic Acids Research, 34(Database issue), D322–D326.

    Article  Google Scholar 

  • Goble, C., & Stevens, R. (2008). State of the nation in data integration for bioinformatics. Journal of Biomedical Informatics, 41(5), 687–693.

    Article  Google Scholar 

  • Guarino, N. (1998). Formal ontology in information systems. In International conference on formal ontology in information systems.

  • Guo, Y., Pan, Z., & Heflin, J. (2004). An evaluation of knowledge base systems for large OWL datasets. In ISWC ’04: Proceedings of the international Semantic Web conference (pp. 274–288).

  • Guo, Y., Pan, Z., & Heflin, J. (2005). LUBM: A benchmark for OWL knowledge base systems. Journal of Web Semantics, 3(2–3), 158–182.

    Article  Google Scholar 

  • Haarslev, V., & Möller, R. (2001). High performance reasoning with very large knowledge bases: A practical case study. In IJCAI ’01: Proceedings of the international joint conferences on artificial intelligence (pp. 161–168).

  • Hill, D. P., Smith, B., McAndrews-Hill, M. S., & Blake, J. A. (2008). Gene Ontology annotations: What they mean and where they come from. BMC Bioinformatics, 9(5), S2.

    Article  Google Scholar 

  • Horrocks, I., Li, L., Turi, D., & Bechhofer, S. (2004). The instance store: DL reasoning with large numbers of individuals. In Description logics.

  • Imieliński, T., & Lipski, W. Jr. (1984). Incomplete information in relational databases. Journal of the ACM, 31(4), 761–791.

    Article  MATH  Google Scholar 

  • Jarke, M., Gallersdörfer, R., Jeusfeld, M. A., & Staudt, M. (1995). ConceptBase—A deductive object base for meta data management. Journal of Intelligence and Information Systems, 4(2), 167–192.

    Article  Google Scholar 

  • Kolaitis, P. G. (2005). Schema mappings, data exchange, and metadata management. In PODS ’05 (pp. 61–75). New York: ACM.

    Chapter  Google Scholar 

  • Kowalski, R. A., Sadri, F., & Soper, P. (1987). Integrity checking in deductive databases. In VLDB (pp. 61–69).

  • Lenzerini, M. (2002). Data integration: A theoretical perspective. In PODS ’02 (pp. 233–246). New York: ACM.

    Chapter  Google Scholar 

  • LePendu, P., Dou, D., Frishkoff, G. A., & Rong, J. (2008). Ontology database: A new method for semantic modeling and an application to brainwave data. In SSDBM ’08: Proceedings of the international conference on statistical and scientific database management (pp. 313–330).

  • LePendu, P., Dou, D., & Howe, D. (2009). Detecting inconsistencies in the gene ontology using ontology databases with not-gadgets. In ODBASE ’09: Proceedings of the international conference on ontologies, databases and application of semantics (pp. 948–965).

  • Levesque, H. J., & Lakemeyer, G. (2001). The logic of knowledge bases. Boston, MA, USA: MIT Press.

    MATH  Google Scholar 

  • Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., & Liu, S. (2006). Towards a complete OWL ontology benchmark. In European sem. web conf. (ESWC) (pp. 125–139).

  • Motik, B., Horrocks, I., & Sattler, U. (2007). Bridging the gap between OWL and relational databases. In WWW 07’: Proceedings of the 16th international conference on World Wide Web (pp. 807–816).

  • Neumann, T., & Weikum, G. (2009). Scalable join processing on very large RDF graphs. In SIGMOD ’09: Proceedings of the ACM SIGMOD international conference on management of data (pp. 627–640).

  • Noy, N., Shah, N., Whetzel, P., Dai, B., Dorf, M., Griffith, N., et al. (2009). BioPortal: Ontologies and integrated data resources at the click of a mouse. Nucleic Acids Research, 1(37), W372–W376.

    Google Scholar 

  • O’Connor, M. J., & Das, A. K. (2008). SQWRL: A query language for OWL. In OWLED (Vol. 529). CEUR-WS.org.

  • Qin, H., Dou, D., & LePendu, P. (2007). Discovering executable semantic mappings between ontologies. In Proceedings of the international conference on ontologies, databases and application of semantics (pp. 832–849).

  • Racunas, S. A., Shah, N. H., Albert, I., & Fedoroff, N. V. (2004). Hybrow: A prototype system for computer-aided hypothesis evaluation. In ISMB/ECCB (supplement of bioinformatics) (pp. 257–264).

  • Reiter, R. (1977). Deductive question-answering on relational data bases. In Logic and data bases (pp. 149–177).

  • Reiter, R. (1992). What should a database know? Journal of Logic Programming, 14(1&2), 127–153.

    Article  MathSciNet  MATH  Google Scholar 

  • Shah, N., Jonquet, C., Chiang, A., Butte, A., Chen, R., & Musen, M. (2009). Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics, 10, S1.

    Article  Google Scholar 

  • Sheth, A. P., & Larson, J. A. (1990). Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys, 22(3), 183–236.

    Article  Google Scholar 

  • Sprague, J., Westerfield, M., et al. (2007). The zebrafish information network: The zebrafish model organism database provides expanded support for genotypes and phenotypes. Nucleic Acids Research, 36, D768–D772.

    Article  Google Scholar 

  • Tempich, C., & Volz, R. (2003). Towards a benchmark for Semantic Web reasoners—An analysis of the DAML ontology library. In Evaluation of ontology-based tools wkshp. (ISWC).

  • Ullman, J. D. (1988). Principles of database and knowledge-base systems (Vol. I). New York, NY, USA: Computer Science Press.

    Google Scholar 

  • Vasilecas, O., & Bugaite, D. (2007). An algorithm for the automatic transformation of ontology axioms into a rule model. In CompSysTech ’07: Proceedings of the international conference on computer systems and technologies (pp. 1–6). New York: ACM.

    Chapter  Google Scholar 

  • Vieille, L., Bayer, P., Küchenhoff, V., Lefebvre, A., & Manthey, R. (1992). The EKS-V1 system. In LPAR ’92: Proceedings of the international conference on logic programming and automated reasoning (pp. 504–506). London: Springer.

    Chapter  Google Scholar 

  • Wache, H., Vögele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neumann, H., et al. (2001). Ontology-based integration of information—A survey of existing approaches. In H. Stuckenschmidt (Ed.), IJCAI ’01: Workshop on ontologies and information sharing (pp. 108–117).

  • Wang, S., Guo, Y., Qasem, A., & Heflin, J. (2005). Rapid benchmarking for Semantic Web knowledge base systems. In Int’l sem. web conf. (ISWC) (pp. 758–772).

Download references

Acknowledgements

This work was supported in part by grant R01 EB007684 from the National Institutes of Health. We thank Doug Howe and Jiawei Rong for their contributions on our GO and NEMO case studies. We also thank the ZFIN group, Zena M. Ariola and Gwen A. Frishkoff for their feedback on and contributions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paea LePendu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

LePendu, P., Dou, D. Using ontology databases for scalable query answering, inconsistency detection, and data integration. J Intell Inf Syst 37, 217–244 (2011). https://doi.org/10.1007/s10844-010-0133-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-010-0133-4

Keywords

Navigation