Retrieval Performance Experiment with the Webspace Method
Finding relevant information using search engines that index large portions of the World-Wide Web is often a frustrating task. Due to the diversity of the information available, those search engines will have to rely on techniques, developed in the field of information retrieval (IR).
When focusing on more limited domains of the Internet, large collections of documents can be found, having a highly structured and multimedia character. Furthermore, it can be assumed that the content is more related. This allows more precise and advanced query formulation techniques to be used for the Web, as commonly used within a database environment. The Webspace Method focuses on such document collections, and offers an approach for modelling and searching large collections of documents, based on a conceptual schema.
The main focus in this article is the evaluation of a retrieval performance experiment, carried out to examine the advances of the webspace search engine, compared to a standard search engine using a widely accepted IR model. A major improvement in retrieval performance, measured in terms of recall and precision, up to a factor two, can be achieved when searching document collections, using the Webspace Method.
KeywordsSearch Engine Information Retrieval Relevant Document Retrieval Performance Query Term
Unable to display preview. Download preview PDF.
- 1.G. O. Arocena and Mendelzon O. WebOQL: Exploiting document structure in web queries. In proceedings of the International Conference on Data Engineering (ICDE), pages 24–33, 1998.Google Scholar
- 2.R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999. ISBN_ISSN: 0-201-39829-X.Google Scholar
- 3.S. Ceri, S. Comai, E. Damiani, P. Fraternali, S. Paraboschi, and L. Tanca. XML-GL: A graphical language for querying and restructuring XML documents. In proceedings of the International World Wide Web Conference (WWW, pages 1171–1187, Canada, 1999.Google Scholar
- 4.D. Chamberlin, D. Florescu, J. Robie, J. Simeon, and M. Stefanescu. XQuery: A query language for XML. Technical report, World Wide Web Consortium (W3C), http://www.w3.org/TR/xquery, Februar 2001.
- 5.D.A Grossman and O. Frieder. Information Retrieval: Algorithms and Heuristics. Kluwer international series in engineering and computer science. Kluwer Academic Publishers, 1998. ISSN_ISBN: 0-7923-8271-4.Google Scholar
- 6.A.P. de Vries and A.N. Wilschut. On the integration of IR and databases. In proceedings of the IFIP 2.6 Working Conference on Data Semantics 8, 1999.Google Scholar
- 7.N. Fuhr and K. Grossjohan. XIRQL: An extension of XQL for information retrieval. In proceeding of ACM SIGIR Workshop On XML and Information Retrieval, Athens, Greece, July 2000.Google Scholar
- 8.Y. Hayashi, J. Tomita, and G. Kikui. Searching text-rich xml documents with relevance ranking. In proceedings of the ACM SIGIR 2000 Workshop on XML and Information Retrieval, Athens, Greece, July 2000.Google Scholar
- 9.I.A.G.H. Klerkx and W.G. Tijhuis. Concept-based search and content-based information retrieval. Master’s thesis, Saxion Hogeschool Enschede, in cooperation with the department of Computer Science, University of Twente, Enschede, The Netherlands, march 2001. (in Dutch).Google Scholar
- 10.G. Mecca, P. Merialdo, and P. Atzeni. Araneus in the era of xml. IEEE Data Engineering Bullettin, Special Issue on XML, September 1999.Google Scholar
- 11.Alberto O. Mendelzon, George A. Mihaila, and Tova Milo. Querying the world wide web. Int. Journal on Digital Libraries, 1(1):54–67, 1997.Google Scholar
- 12.Lonely Planet Publications. Lonely planet online, March 2001, http://www.lonelyplanet.com/.
- 13.R. van Zwol and P.M.G. Apers. Searching documents on the intranet. In proceedings of Workshop on Organizing Webspace (WOWS’99), in conjunction with Digital Libraries 1999, Berkeley (CA), USA, August 1999.Google Scholar
- 14.R. van Zwol and P.M.G. Apers. Using webspaces to model document collections on the web. In proceedings of Workshop on WWW and Conceptual Modelling (WCM 2000), in conjunction with ER 2000, Salt Lake City (USA), October 2000.Google Scholar
- 15.R. van Zwol and P.M.G. Apers. The webspace method: On the integration of database technology with information retrieval. In proceedings of Ninth International Conference on Information and Knowledge Management (CIKM 2000), Washington DC., USA, November 2000.Google Scholar