Skip to main content
Log in

Querying Documents using Content, Structure and Properties

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Much information is nowadays stored electronically in document bases. Users retrieve information from these document bases by browsing and querying. While a large number of tools are available nowadays, not much work has been done on tools that support queries involving all the characteristics of documents as well as the use of domain knowledge during the search for information. In this paper we propose a query language that allows for querying documents using content information, information about the logical structure of the documents as well as information about properties of the documents. Domain knowledge is taken into account during the search as well. We also present an architecture for a system supporting such a language and we describe a prototype implementation together with test results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. (1997). The Lorel Query Language for Semistructured Data, International Journal on Digital Libraries, 1(1), 68–88.

    Google Scholar 

  • Baeza-Yates, R. and Navarro, G. (1996). Integrating Contents and Structure in Text Retrieval, ACM SIGMOD Record, 25(1), 67–79.

    Google Scholar 

  • Borgida, A., Brachman, R., McGuinness, D., and Resnick, L. (1989). CLASSIC: A Structural Data Model for Objects. Proceedings of SIGMOD (pp. 58–67).

  • Christophides, V., Abiteboul, S., Cluet, S., and Scholl, M. (1994). From Structured Documents to Novel Query Facilities. Proceedings of SIGMOD (pp. 1–22).

  • Fernandez, M., Florescu, D., Levy, A., and Suciu, D. (1997). A Query Language for a Web-Site Management System, ACM SIGMOD Record, 26(3), 4–11.

    Google Scholar 

  • Franconi, E., De Giacomo, G., MacGregor, R., Nutt, W., and Welty, C. (eds) (1998). Proceedings of the International Workshop on Description Logics.

  • Hoch, R. (1994). Using IR Techniques for Text Classification in Document Analysis. In Proceedings of SIGIR (pp. 31–40).

  • Kilpeläinen, P. and Manilla, H. (1993). Retrieval from Hierarchical Texts by Partial Patterns. Proceedings of SIGIR (pp. 214–222).

  • Lambrix, P. (1996). Part-Whole Reasoning in Description Logics. Ph.D. Thesis 448, Dept. of Computer and Information Science, Link¨opings universitet, Sweden.

    Google Scholar 

  • Lambrix, P. (2000). Part-Whole Reasoning in an Object-Centered Framework, LNAI 1771, Springer Verlag.

  • Lambrix, P. and Padgham, L. (1997). A Description Logic Model for Querying Knowledge Bases for Structured Documents. Proceedings of the Tenth International Symposium on Methodologies for Intelligent Systems (pp. 72–83). LNAI 1325.

  • Lambrix, P. and Padgham, L. (2000). Conceptual Modeling in a Document Management Environment using Part-of Reasoning in Description Logics, Data & Knowledge Engineering, 32, 51–86.

    Google Scholar 

  • Lambrix, P., Shahmehri, N., and Jacobsen, S. (1999). Querying Document Bases by Content, Structure and Properties. Proceedings of the Eleventh International Symposium on Methodologies for Intelligent Systems (pp. 123–132). LNAI 1609.

  • Lambrix, P., Shahmehri, N., and Wahllöf, N. (1998). A Default Extension to Description Logics for Use in an Intelligent Search Engine. Proceedings of the 31st Hawaii International Conference on System Sciences, Vol. V (pp. 28–35).

  • Lambrix, P., Shahmehri, N., and Åberg, J. (1997). Towards Creating a Knowledge Base for World-Wide Web Documents. Proceedings of the IASTED International Conference on Intelligent Information Systems (pp. 507–511).

  • Larkey, L. and Croft, B. (1996). Combining Classifiers in Text Categorization. Proceedings of SIGIR (pp. 289–297).

  • MacLeod, I. (1991). A Query Language for Retrieving Information from Hierarchic Text Structures, The Computer Journal, 34(3), 254–264.

    Google Scholar 

  • McGuinness, D. (1998). Ontological Issues for Knowledge-Enhanced Search. Proceedings of the Workshop on Formal Ontology in Information Systems.

  • Meghini, C., Sebastiani, F., Straccia, U., and Thanos, C. (1993). A Model of Information Retrieval based on a Terminological Logic. In Proceedings of SIGIR (pp. 298–307).

  • Mendelzon, A., Mihaila, G., and Milo, T. (1997). Querying theWorld Wide Web, International Journal on Digital Libraries, 1(1), 54–67.

    Google Scholar 

  • MUC-Message Understanding Conferences: <http://www.muc.saic.com/>

  • Navarro, G. and Baeza-Yates, R. (1997). Proximal Nodes: A Model to Query Document Databases by Content and Structure, ACM Transactions on Information Systems, 15(4), 400–435.

    Google Scholar 

  • Paice, C. and Jones, P. (1993). The Identification of Important Concepts in Highly Structured Technical Papers. Proceedings of SIGIR (pp. 69–78).

  • Rau, L., Jacobs, P., and Zernik, U. (1989). Information Extraction and Text Summarization using Linguistic Knowledge Acquisition, Information Processing and Management, 25(4), 419–428.

    Google Scholar 

  • Riloff, E. and Lehnert, W. (1994). Information Extraction as a Basis for High-Precision Text Classification, ACM Transactions on Information Systems, 12(3), 296–333.

    Google Scholar 

  • Sacks-Davis, R., Arnold-Moore, T., and Zobel, J. (1994). Database Systems for Structured Documents. Proceedings of the International Symposium on Advanced Database Technologies and Their Integration (pp. 272–283).

  • Sebastiani, F. (1994). A Probabilistic Terminological Logic for Modelling Information Retrieval. In Proceedings of SIGIR (pp. 122–130).

  • Soderland, S., Fisher, D., Aseltime, J., and Lehnert, W. (1995). CRYSTAL: Inducing a Conceptual Dictionary. In Proceedings of IJCAI (pp. 1314–1321).

  • Voorhees, E. (1994). Query Expansion using Lexical-Semantic Relations. Proceedings of SIGIR (pp. 61–69).

  • Wu, S. and Manber, U. (1992). Fast Text Searching Allowing Errors, Communications of the ACM, 35(10), 93–91.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lambrix, P., Shahmehri, N. Querying Documents using Content, Structure and Properties. Journal of Intelligent Information Systems 15, 287–307 (2000). https://doi.org/10.1023/A:1008784514647

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008784514647

Navigation