Skip to main content

The Happy Searcher: Challenges in Web Information Retrieval

  • Conference paper
PRICAI 2004: Trends in Artificial Intelligence (PRICAI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3157))

Included in the following conference series:

Abstract

Search has arguably become the dominant paradigm for finding information on the World Wide Web. In order to build a successful search engine, there are a number of challenges that arise where techniques from artificial intelligence can be used to have a significant impact. In this paper, we explore a number of problems related to finding information on the web and discuss approaches that have been employed in various research programs, including some of those at Google. Specifically, we examine issues of such as web graph analysis, statistical methods for inferring meaning in text, and the retrieval and analysis of newsgroup postings, images, and sounds. We show that leveraging the vast amounts of data on web, it is possible to successfully address problems in innovative ways that vastly improve on standard, but often data impoverished, methods. We also present a number of open research problems to help spur further research in these areas.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Proc. of the 7th International World Wide Web Conference, pp. 107–117 (1998)

    Google Scholar 

  2. Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bharat, K., Henzinger, M.R.: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In: Proc. of the 21st International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111 (1998)

    Google Scholar 

  4. Tomlin, J.A.: A New Paradigm for Ranking Pages on the World Wide Web. In: Proc. of the 12th International World Wide Web Conference, pp. 350–355 (2003)

    Google Scholar 

  5. Henzinger, M.R., Motwani, R., Silverstein, C.: Challenges in Web Search Engines. In: Proc. of the 18th International Joint Conference on Artificial Intelligence, pp. 1573–1579 (2003)

    Google Scholar 

  6. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization: Papers from the 1998 Workshop. AAAI Technical Report WS-98-05 (1998)

    Google Scholar 

  7. Dumais, S., Bharat, K., Joachims, T., Weigend, A. (eds.): Workshop on Implicit Measures of User Interests and Preferences at SIGIR 2003 (2003)

    Google Scholar 

  8. Agosti, M., Melucci, M. (eds.): Workshop on Evaluation of Web Document Retrieval at SIGIR 1999 (1999)

    Google Scholar 

  9. Joachims, T.: Evaluating Retrieval Performance Using Clickthrough Data. In: Proc. of the SIGIR 2002 Workshop on Mathematical/Formal Methods in Information Retrieval (2002)

    Google Scholar 

  10. Mitra, M., Singhal, A., Buckley, C.: Improving Automatic Query Expansion. In: Proc. of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 206–214 (1998)

    Google Scholar 

  11. Smith, M., Kollock, P.: Communities in Cyberspace: Perspectives on New Forms of Social Organization. Routledge Press, London (1999)

    Google Scholar 

  12. Fiore, A., Tiernan, S.L., Smith, M.: Observed Behavior and Perceived Value of Authors in Usenet Newsgroups: Bridging the Gap. In: Proc. of the ACM SIGCHI Conference on Human Factors in Computing Systems, pp. 323–330 (2002)

    Google Scholar 

  13. Arnt, A., Zilberstein, S.: Learning to Perform Moderation in Online Forums. In: Proc. of the IEEE/WIC International Conference on Web Intelligence (2003)

    Google Scholar 

  14. Zhang, Y., Callan, J., Minka, T.P.: Novelty and Redundancy Detection in Adaptive Filtering. In: Proc. of the 25th International ACM-SIGIR Conference on Research and Development in Information Retrieval (2002)

    Google Scholar 

  15. Smith, J.R., Chang, S.F.: Tools and Techniques for Color Image Retrieval. In: Proc. of SPIE Storage and Retrieval for Image and Video Databases, vol. 2670, pp. 426–437 (1996)

    Google Scholar 

  16. Berenzweig, A., Logan, B., Ellis, D., Whitman, B.: A Large-Scale Evaluation of Acoustic and Subjective Music Similarity Measures. In: Proc. of the 4th International Symposium on Music Information Retrieval (2003)

    Google Scholar 

  17. Wu, J., Rehg, J.M., Mullin, M.D.: Learning a Rare Event Detection Cascade by Direct Feature Selection. In: Advances in Neural Information Processing Systems, vol. 16 (2004)

    Google Scholar 

  18. Sung, K., Poggio, T.: Learning Human Face Detection in Cluttered Scenes. In: Proc. of Intl. Conf. on Computer Analysis of Image and Patterns (1995)

    Google Scholar 

  19. Rowley, H.A., Baluja, S., Kanade, T.: Neural Network-based Face Detection. IEEE Trans. On Pattern Analysis and Machine Intelligence 20(1), 23–38 (1998)

    Article  Google Scholar 

  20. Viola, P., Jones, M.: Rapid Object Detection Using a Boosted Cascade of Simple Features. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 511–518 (2001)

    Google Scholar 

  21. Schneiderman, H., Kanade, T.: A Statistical Model for 3D Object Detection Applied to Faces and Cars. In: Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (2000)

    Google Scholar 

  22. Viola, P., Jones, M., Snow, D.: Detecting Pedestrians Using Patterns of Motion and Appearance. Mitsubishi Electric Research Lab Technical Report. TR-2003-90 (2003)

    Google Scholar 

  23. Banko, M., Brill, E.: Mitigating the Paucity of Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for NLP. In: Proc. of the Conference on Human Language Technology (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sahami, M., Mittal, V., Baluja, S., Rowley, H. (2004). The Happy Searcher: Challenges in Web Information Retrieval. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28633-2_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22817-2

  • Online ISBN: 978-3-540-28633-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics