Abstract
Many valuable Web documents have not been indexed by general search engines and are only accessible through specific search interfaces. Metasearching groups of specialty search engines is one possible way to gain access to large amount of such hidden Web resources. One of the key issues for returning quality metasearch results is how to select the most relevant specialty search engines for a given query. We introduce a method for categorizing specialty search engines automatically into a hierarchical directory for metasearching. By utilizing the directory, specialty search engines that have a high possibility of having relevant information and resources can be easily selected by a metasearch engine. We evaluate our algorithm by comparing the directory built by the proposed algorithm with another one that was built by human-judgments. In addition, we present a metasearch engine prototype, which demonstrates that such a specialty search engine directory can be beneficial in locating essential but hidden Web resources.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Benitez, A.B., Beigi, M., Chang, S.F.: Using Relevance Feedback in Content-Based Image Metasearch. IEEE Internet Computing 2(4), 59–69 (1998)
Bergman, M.K.: The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing 7(1) (2001)
Callan, J.P., Lu, Z., Bruce Croft, W.: Searching distributed collections with inference networks. In: Proceedings of the 18th ACM-SIGIR International Conference, Seattle, Washington, USA, July 1995, pp.12-20 (1995)
Callan, J., Connell, M., Du, A.: Automatic Discovery of Language Models for Text Databases. In: Proceedings of ACM-SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA , June 1-3, pp. 479–490 (1999)
Dreilinger, D., Howe, A.E.: Experiences with Selecting Search Engines Using Metasearch. ACM Transactions on Information Systems 15(3), 195–222 (1997)
Fuhr, N.: A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems 17(3), 229–249 (1999)
Gauch, S., Wang, G., Gomez, M.: Profusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science 2(9), 637–649 (1996)
Glover, E., Lawrence, S., Birmingham, Lee Giles, C.: Architecture of a Metasearch Engine that Supports User Information Needs. In: Proceedings of the 8th International Conference on Information Knowledge Management, Kansas City, MO, November 1999, pp. 210–216 (1999)
(2003), http://www.google.com
Gravano, L., Garcia-Molina, H., Tomasic, A.: GlOSS: Text-source discovery over the Internet. ACM Transactions on Database Systems 24(2), 229–264 (1999)
Hawking, D., Thistlewaite, P.: Methods for information server selection. ACM Transactions on Information Systems 17(1), 40–76 (1999)
Shiu, J.K.H., Chan, S.C.F., Chung, K.F.L.: Developing a Directory of Search Engines for Meta-Searching. In: To appear in Proceedings of the 4th International Conference on Intelligent Data Engineering and Automated Learning, Hong Kong, March 21-23 (2003)
Lawrence, S., Lee Giles, C.: Accessibility of information on the Web. Nature 400, 107–109 (1999)
Xu, J., Callan, J.: Effective retrieval with distributed collections. In: Proceedings of the 21st International ACM-SIGIR Conference, Melbourne, Australia, August 24-28, pp. 112–120 (1998)
Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the Internet. In: Proceedings of the 5th Annual International Conference on Database Systems for Advanced Applications, Melbourne, Australia, April 1997, pp. 41–49 (1997)
(2003), http://dmoz.org
(2003), http://www.yahoo.com
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shiu, J.K.H., Chan, S.C.F., Chung, K.F.L. (2003). Accessing Hidden Web Documents by Metasearching a Directory of Specialty Search Engines. In: Bianchi-Berthouze, N. (eds) Databases in Networked Information Systems. DNIS 2003. Lecture Notes in Computer Science, vol 2822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39845-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-39845-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20111-3
Online ISBN: 978-3-540-39845-5
eBook Packages: Springer Book Archive