Skip to main content

Accessing Hidden Web Documents by Metasearching a Directory of Specialty Search Engines

  • Conference paper
  • 280 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2822))

Abstract

Many valuable Web documents have not been indexed by general search engines and are only accessible through specific search interfaces. Metasearching groups of specialty search engines is one possible way to gain access to large amount of such hidden Web resources. One of the key issues for returning quality metasearch results is how to select the most relevant specialty search engines for a given query. We introduce a method for categorizing specialty search engines automatically into a hierarchical directory for metasearching. By utilizing the directory, specialty search engines that have a high possibility of having relevant information and resources can be easily selected by a metasearch engine. We evaluate our algorithm by comparing the directory built by the proposed algorithm with another one that was built by human-judgments. In addition, we present a metasearch engine prototype, which demonstrates that such a specialty search engine directory can be beneficial in locating essential but hidden Web resources.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)

    Google Scholar 

  2. Benitez, A.B., Beigi, M., Chang, S.F.: Using Relevance Feedback in Content-Based Image Metasearch. IEEE Internet Computing 2(4), 59–69 (1998)

    Article  Google Scholar 

  3. Bergman, M.K.: The Deep Web: Surfacing Hidden Value. The Journal of Electronic Publishing 7(1) (2001)

    Google Scholar 

  4. Callan, J.P., Lu, Z., Bruce Croft, W.: Searching distributed collections with inference networks. In: Proceedings of the 18th ACM-SIGIR International Conference, Seattle, Washington, USA, July 1995, pp.12-20 (1995)

    Google Scholar 

  5. Callan, J., Connell, M., Du, A.: Automatic Discovery of Language Models for Text Databases. In: Proceedings of ACM-SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA , June 1-3, pp. 479–490 (1999)

    Google Scholar 

  6. Dreilinger, D., Howe, A.E.: Experiences with Selecting Search Engines Using Metasearch. ACM Transactions on Information Systems 15(3), 195–222 (1997)

    Article  Google Scholar 

  7. Fuhr, N.: A decision-theoretic approach to database selection in networked IR. ACM Transactions on Information Systems 17(3), 229–249 (1999)

    Article  Google Scholar 

  8. Gauch, S., Wang, G., Gomez, M.: Profusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science 2(9), 637–649 (1996)

    Google Scholar 

  9. Glover, E., Lawrence, S., Birmingham, Lee Giles, C.: Architecture of a Metasearch Engine that Supports User Information Needs. In: Proceedings of the 8th International Conference on Information Knowledge Management, Kansas City, MO, November 1999, pp. 210–216 (1999)

    Google Scholar 

  10. (2003), http://www.google.com

  11. Gravano, L., Garcia-Molina, H., Tomasic, A.: GlOSS: Text-source discovery over the Internet. ACM Transactions on Database Systems 24(2), 229–264 (1999)

    Article  Google Scholar 

  12. Hawking, D., Thistlewaite, P.: Methods for information server selection. ACM Transactions on Information Systems 17(1), 40–76 (1999)

    Article  Google Scholar 

  13. Shiu, J.K.H., Chan, S.C.F., Chung, K.F.L.: Developing a Directory of Search Engines for Meta-Searching. In: To appear in Proceedings of the 4th International Conference on Intelligent Data Engineering and Automated Learning, Hong Kong, March 21-23 (2003)

    Google Scholar 

  14. Lawrence, S., Lee Giles, C.: Accessibility of information on the Web. Nature 400, 107–109 (1999)

    Article  Google Scholar 

  15. Xu, J., Callan, J.: Effective retrieval with distributed collections. In: Proceedings of the 21st International ACM-SIGIR Conference, Melbourne, Australia, August 24-28, pp. 112–120 (1998)

    Google Scholar 

  16. Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the Internet. In: Proceedings of the 5th Annual International Conference on Database Systems for Advanced Applications, Melbourne, Australia, April 1997, pp. 41–49 (1997)

    Google Scholar 

  17. (2003), http://dmoz.org

  18. (2003), http://www.yahoo.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shiu, J.K.H., Chan, S.C.F., Chung, K.F.L. (2003). Accessing Hidden Web Documents by Metasearching a Directory of Specialty Search Engines. In: Bianchi-Berthouze, N. (eds) Databases in Networked Information Systems. DNIS 2003. Lecture Notes in Computer Science, vol 2822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39845-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39845-5_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20111-3

  • Online ISBN: 978-3-540-39845-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics