Information Technology and Management

, Volume 8, Issue 3, pp 205–221 | Cite as

Extracting knowledge from XML document repository: a semantic Web-based approach

  • Henry M. Kim
  • Arijit Sengupta


XML plays an important role as the standard language for representing structured data for the traditional Web, and hence many Web-based knowledge management repositories store data and documents in XML. If semantics about the data are formally represented in an ontology, then it is possible to extract knowledge: This is done as ontology definitions and axioms are applied to XML data to automatically infer knowledge that is not explicitly represented in the repository. Ontologies also play a central role in realizing the burgeoning vision of the semantic Web, wherein data will be more sharable because their semantics will be represented in Web-accessible ontologies. In this paper, we demonstrate how an ontology can be used to extract knowledge from an exemplar XML repository of Shakespeare’s plays. We then implement an architecture for this ontology using de facto languages of the semantic Web including OWL and RuleML, thus preparing the ontology for use in data sharing. It has been predicted that the early adopters of the semantic Web will develop ontologies that leverage XML, provide intra-organizational value such as knowledge extraction capabilities that are irrespective of the semantic Web, and have the potential for inter-organizational data sharing over the semantic Web. The contribution of our proof-of-concept application, KROX, is that it serves as a blueprint for other ontology developers who believe that the growth of the semantic Web will unfold in this manner.


XML Ontologies Knowledge extraction Query processing Semantic Web 


  1. 1.
    H. Alani, S. Kim, D.E. Millard, M.J. Weal, W. Hall, P.H. Lewis and N.R. Shadbolt, Automatic ontology-based knowledge extraction from web documents, IEEE Intelligent Systems 18 (2003) 14–21.CrossRefGoogle Scholar
  2. 2.
    B. Amann, C. Beeri, I. Fundulaki and M. Scholl, Querying XML sources using an ontology-based mediator. Lecture Notes in Computer Science 2519 (2002) 429–448.Google Scholar
  3. 3.
    J.C. Arpírez, O. Corcho, M. Fernández-López and A. Gómez-Pérez, WebODE in a nutshell. AI Magazine 24 (2003) 37–47.Google Scholar
  4. 4.
    T. Berners-Lee, J. Hendler and O. Lassila, The semantic web, Scientific American 284 (2001) 34–43.CrossRefGoogle Scholar
  5. 5.
    S. Boag, D. Chamberlin, M. Fernandez, D. Florescu, J. Robie and J. Simeon, XQuery 1.0: An XML query language – W3C working draft, 29 October 2004., W3C, 2004 (Updated:October 29).Google Scholar
  6. 6.
    H. Boley, S. Tabet and G. Wagner, Design rationale of RuleML: A markup language for semantic web rules, in Proceedings of First Semantic Web Working Symposium (SWWS’01), Stanford, CA, 2001.Google Scholar
  7. 7.
    J. Bosak, The plays of Shakespeare., Open, 1999 (last updated: July).Google Scholar
  8. 8.
    C. Brewster, F. Ciravegna and Y. Wilks, User-centred ontology learning for knowledge management, Lecture Notes in Computer Science 2553 (2002) 203–207.CrossRefGoogle Scholar
  9. 9.
    A.E. Campbell and S.C. Shapiro, Ontological mediation: An overview, in: Proceedings of IJCAI Workshop on Basic Ontological Issues in Knowledge Sharing, Menlo Park, CA (1995).Google Scholar
  10. 10.
    V. Christophides, G. Karvounarakis, I. Koffina, G. Kokkinidis, A. Magkanaraki, D. Plexousakis, G. Serfiotis and V. Tannen, The ICS-FORTH SWIM: A powerful semantic web integration middleware, in: Proceedings of the First International Workshop on Semantic Web and Databases (SWDB), Humboldt-Universitat, Berlin, Germany, 2003.Google Scholar
  11. 11.
    J. Clark, XSL Transformations (XSLT) Version 1.0., W3C, 1999 (Updated: November 16).Google Scholar
  12. 12.
    CommerceOne, XML Common Business Library. Commerce One Inc., Pleasanton, CA, 2003.Google Scholar
  13. 13.
    M. Erdmann and R. Studer, How to structure and access XML documents with ontologies, Data and Knowledge Engineering 36 (2001) 317–335.CrossRefGoogle Scholar
  14. 14.
    D. Faure and C. Nedellec, Knowledge acquisition of predicate-argument structures from technical texts using machine learning, in Presented at EKAW, Dagstuhl Castle, Germany, 1999.Google Scholar
  15. 15.
    M. Fox, F. Fadel and J. Chionglo, A common-sense model of the enterprise, in Proceedings of the Industrial Engineering Research Conference, Atlanta, GA, 1993.Google Scholar
  16. 16.
    R.J. Glushko, J.M. Tenenbaum and B. Meltzer, An XML framework for agent-based E-commerce, Communications of the ACM 42 (1999) 106.Google Scholar
  17. 17.
    C.H. Goh, S. Bressan, S. Madnick and M. Siegel, Context interchange: New features and formalisms for the intelligent integration of information, ACM Transactions on Information Systems 17 (1999) 270–293.CrossRefGoogle Scholar
  18. 18.
    A. Gómez-Pérez, M. Fernández-López and O. Corcho, Ontological Engineering with examples from the areas of Knowledge Management, e-Commerce and the Semantic Web, Springer, 2004.Google Scholar
  19. 19.
    T.R. Gruber, Towards principles for the design of ontologies used for knowledge sharing, in: Proceedings of the International Workshop on Formal Ontology, Padova, Italy, 1993.Google Scholar
  20. 20.
    M. Gruninger and M.S. Fox, The role of competency questions in enterprise engineering, in Proceedings of the IFIP WG5.7 Workshop on Benchmarking – Theory and Practice, Trondheim, Norway, June 1994.Google Scholar
  21. 21.
    S. Handschuh, S. Staab and F. Ciravegna, S-CREAM – Semi-automatic creation of metadata. Lecture Notes in Computer Science 2473 (2002) 358–372.Google Scholar
  22. 22.
    I. Horrocks, P.F. Patel-Schneider and F.v. Harmelen, From SHIQ and RDF to OWL: The making of a web ontology language, Journal of Web Semantics 1 (2003) 7–26.Google Scholar
  23. 23.
    ISDA, FpML™: The XML Standard for Swaps, Derivatives, and Structured Products., International Swaps and Derivatives Association, 2004 (last updated: November 19).Google Scholar
  24. 24.
    J.-U. Kietz, A. Maedche and R. Volz, A method for semi-automatic ontology acquisition from a corporate intranet, in Proceedings of EKAW’00 Workshop on Ontologies and Text, Juan-Les-Pins, France, 2000.Google Scholar
  25. 25.
    H.M. Kim, Predicting how the semantic web will evolve, Communications of the ACM 45 (2002) 48–54.Google Scholar
  26. 26.
    H.M. Kim and M.S. Fox, Towards a data model for quality management web services: An ontology of measurement for enterprise modeling, Lecture Notes in Computer Science 2348 (2002) 230–244.Google Scholar
  27. 27.
    H.M. Kim, Integrating business process-oriented and data-driven approaches for ontology development, in: Proceedings of the AAAI Spring Symposium Series 2000 – Bringing Knowledge to Business Processes, Stanford, CA, 2000.Google Scholar
  28. 28.
    H.M. Kim, XML-hoo! A prototype application for intelligent query of XML documents using domain-specific ontologies, in Proceedings of 35th Annual Hawaii International Conference on Systems Science (HICSS-35), Hawaii, HI, 2002.Google Scholar
  29. 29.
    P. Lehti and P. Fankhauser, XML data integration with OWL: experiences and challenges. in Proceedings of International Symposium on Applications and the Internet, Fraunhofer Inst., Darmstadt, Germany, 2004.Google Scholar
  30. 30.
    A. Maedche, S. Staab, R. Studer, Y. Sure and R. Volz, SEAL – Tying up information integration and web site management by ontologies, IEEE Computer Society Data Engineering Bulletin 25 (2002) 10–17.Google Scholar
  31. 31.
    G. Modica, A. Gal and H. Jamil, The use of machine-generated ontologies in dynamic information seeking, in Proceedings of Cooperative Information Systems (CoopIS ’01), Trento, Italy, 2001.Google Scholar
  32. 32.
    L. Narens, Abstract Measurement Theory. (MIT Press, Cambridge, MA 1985).Google Scholar
  33. 33.
    N.F. Noy, M.S. Decker, M. Crubezy, R.W. Fergerson and M.A. Musen, Creating semantic web contents with Protégé-2000, IEEE Intelligent Systems 16 (2001) 60–71.CrossRefGoogle Scholar
  34. 34.
    S. Philippi and J. Kohler, Using XML technology for the ontology-based semantic integration of life science databases, IEEE Transactions on Information Technology in Biomedicine 8 (2004) 154–160.CrossRefGoogle Scholar
  35. 35.
    W. Shen, X. Li and A. Doan, Constraint-Based Entity Matching, in Proceedings of the American AI Conference (AAAI-05), Pittsburgh, PA, July 2005.Google Scholar
  36. 36.
    M. Sintek, M. Junker, L. Elst and A. Abecker, Using information extration rules for extending domain ontologies, in Proceedings of IJCAI-2001 Workshop on Ontology Learning, Seattle, 2001.Google Scholar
  37. 37.
    H. Smith and K. Poulter, Share the ontology in XML-based trading architectures. Communications of the ACM 42, 1999.Google Scholar
  38. 38.
    Y. Sure, M. Erdmann, J. Angele, R. Studer, S. Staab and D. Wenke, OntoEdit: Collaborative ontology development for the semantic web, Lecture Notes in Computer Science 2342 (2002).Google Scholar
  39. 39.
    A. Tomasic, L. Raschid and P. Valduriez, Scaling access to heterogeneous data sources with DISCO, IEEE Transactions on Knowledge and Data Engineering 10 (1998) 808–823.CrossRefGoogle Scholar
  40. 40.
    M. Vargas-Vera, E. Motta, J. Domingue, S. B. Shum and M. Lanzoni, Knowledge extraction by using an ontology-based annotation tool, in Proceedings of the First International Conference on Knowledge Capture (K-CAP’01), Victoria, BC,Canada, 2001.Google Scholar
  41. 41.
    Y. Wand and R. Weber, Towards a theory of deep structure of information systems, Journal of Information Systems (1995) 203–223.Google Scholar
  42. 42.
    G. Wiederhold, Mediation in information systems, ACM Computing Surveys 27 (1995) 265–267.CrossRefGoogle Scholar
  43. 43.
    M. Uschold and M. Grüninger, Ontologies: Principles, methods, and applications, The Knowledge Engineering Review 11(2) (1996) 93–115.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Schulich School of BusinessYork UniversityTorontoCanada
  2. 2.Raj Soin College of BusinessWright State UniversityDaytonUSA

Personalised recommendations