Skip to main content

Information Retrieval and Web Search

  • Chapter
  • First Online:
Book cover Web Data Mining

Part of the book series: Data-Centric Systems and Applications ((DCSA))

Abstract

Web search needs no introduction. Due to its convenience and the richness of information on the Web, searching the Web is increasingly becoming the dominant information seeking method. People make fewer and fewer trips to libraries, but more and more searches on the Web. In fact, without effective search engines and rich Web contents, writing this book would have been much harder.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. Adali, S., T. Liu, and M. Magdon-Ismail. Optimal Link Bombs are Uncoordinated. In Proceedings of 1st International Workshop on Adversarial Information Retrieval on the Web, 2005.

    Google Scholar 

  2. Amitay, E., D. Carmel, A. Darlow, R. Lempel, and A. Soffer. The connectivity sonar: detecting site functionality by structural patterns. In Proceedings of ACM Conference on Hypertext and Hypermedia, 2003.

    Google Scholar 

  3. Aslam, J. and M. Montague. Models for metasearch. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2001), 2001.

    Google Scholar 

  4. Baeza-Yates, R., C. Castillo, V. López, and C. Telefónica. PageRank increase under different collusion topologies. In Proceedings of Intl. Workshop on Adversarial Information Retrieval on the Web, 2005.

    Google Scholar 

  5. Baeza-Yates, R. and B. Ribeiro-Neto. Modern information retrieval. 1999: Addison-Wesley.

    Google Scholar 

  6. Bar-Yossef, Z. and M. Gurevich. Random sampling from a search engine'sindex. Journal of the ACM (JACM), 2008, 55(5): p. 1–74.

    Article  MathSciNet  Google Scholar 

  7. Bar-Yossef, Z. and S. Rajagopalan. Template detection via data mining and its applications. In Proceedings of International Conference on World Wide Web (WWW-2002), 2002.

    Google Scholar 

  8. Bell, T., A. Moffat, C. Nevill-Manning, I. Witten, and J. Zobel. Data compression in full-text retrieval systems. Journal of the American Society for Information Science, 1993, 44(9): p. 508–531.

    Article  Google Scholar 

  9. Berry, M., S. Dumais, and G. O'Brien. Using linear algebra for intelligent information retrieval. SIAM review, 1995, 37(4): p. 573–595.

    Article  MATH  MathSciNet  Google Scholar 

  10. Brin, S. and P. Lawrence. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 1998, 30(1–7): p. 107–117.

    Google Scholar 

  11. Cai, D., S. Yu, J. Wen, and W. Ma. Block-based web search. In Proceedings of ACM SIGIR Research and Development in Information Retrieval (SIGIR-2004), 2004.

    Google Scholar 

  12. Cai, D., S. Yu, J. Wen, and W. Ma. Extracting content structure for web pages based on visual representation. In In Processings of APWeb-2003, 2003.

    Google Scholar 

  13. Cao, Y., J. Xu, T. Liu, H. Li, Y. Huang, and H. Hon. Adapting ranking SVM to document retrieval. In Proceedings of ACM SIGIR Research and Development in Information Retrieval (SIGIR-2006), 2006.

    Google Scholar 

  14. Chakrabarti, S. Mining the Web: discovering knowledge from hypertext data. 2003: Morgan Kaufmann Publishers.

    Google Scholar 

  15. Chakrabarti, S., K. Puniyani, and S. Das. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  16. Chen, S. and J. Goodman. An empirical study of smoothing techniques for language modeling, 1996: Association for Computational Linguistics.

    Google Scholar 

  17. Debnath, S., P. Mitra, and C. Giles. Automatic extraction of informative blocks from webpages. In Proceedings of ACM Symposium on Applied Computing, 2005.

    Google Scholar 

  18. Deerwester, S., S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6): p. 391–407.

    Article  Google Scholar 

  19. Deng, L., X. Chai, Q. Tan, W. Ng, and D. Lee. Spying out real user preferences for metasearch engine personalization. In Proceedings of Workshop on WebKDD, 2004.

    Google Scholar 

  20. Elias, P. Universal codeword sets and representations of the integers. Information Theory, IEEE Transactions on, 1975, 21(2): p. 194–203.

    Article  MATH  MathSciNet  Google Scholar 

  21. Fetterly, D., M. Manasse, and M. Najork. Detecting phrase-level duplication on the world wide web. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2005), 2005.

    Google Scholar 

  22. Fox, E. and J. Shaw. Combination of multiple searches. NIST Special Publications, 1994: p. 243–243.

    Google Scholar 

  23. Gibson, D., K. Punera, and A. Tomkins. The volume and evolution of web page templates. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.

    Google Scholar 

  24. Golomb, S. Run-length encoding. IEEE Transactions on Information Theory, 1966, 12(3): p. 399–401.

    Article  MATH  MathSciNet  Google Scholar 

  25. Golub, G. and C. Van Loan. Matrix computations. 1996: Johns Hopkins Univ Press.

    Google Scholar 

  26. Grossman, D.A. and O. Frieder. Information Retrieval: Algorithms and Heuristics. 2004: Springer.

    Google Scholar 

  27. Gyöngyi, Z. and H. Garcia-Molina. Link spam alliances. In Proceedings of International Conference on Very Large Data Bases (VLDB-2005), 2005: VLDB Endowment.

    Google Scholar 

  28. Gyöngyi, Z. and H. Garcia-Molina. Web spam taxonomy. In Technical Report, Stanford University, 2004.

    Google Scholar 

  29. Gyöngyi, Z., H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proceedings of International Conference on Very Large Data Bases (VLDB-2004), 2004.

    Google Scholar 

  30. Ho Kwok, S. and C. Yang. Searching the peer-to-peer networks: The community and their queries. Journal of the American Society for Information Science and Technology, 2004, 55(9): p. 783–793.

    Google Scholar 

  31. Joachims, T. Optimizing search engines using clickthrough data. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.

    Google Scholar 

  32. Jones, R., B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  33. Kelly, J. Social choice theory: An introduction. 1988: Springer-Verlag.

    Google Scholar 

  34. Klavans, J. and S. Muresan. DEFINDER: Rule-based methods for the extraction of medical terminology and their associated definitions from online text. In Proceedings of Conference of American Medical Informatics Association, 2000.

    Google Scholar 

  35. Korn, F., H. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-1997), 1997.

    Google Scholar 

  36. Kraft, R., C. Chang, F. Maghoul, and R. Kumar. Searching with context. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  37. Li, X., T. Phang, M. Hu, and B. Liu. Using micro information units for internet search. In Proceedings of ACM International Conference on Information and knowledge management (CIKM-2002), 2002.

    Google Scholar 

  38. Lin, S. and J. Ho. Discovering informative content blocks from Web documents. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.

    Google Scholar 

  39. Liu, B., C. Chin, and H. Ng. Mining topic-specific concepts and definitions on the web. In Proceedings of International Conference on World Wide Web (WWW-2003), 2003.

    Google Scholar 

  40. McBryan, O. GENVL and WWWW: Tools for Taming the Web. In Proceedings of International Conference on World Wide Web (WWW-1994), 1994.

    Google Scholar 

  41. Meng, W., C. Yu, and K. Liu. Building efficient and effective metasearch engines. ACM Computing Surveys (CSUR), 2002, 34(1): p. 48–89.

    Article  Google Scholar 

  42. Moffat, A., R. Neal, and I. Witten. Arithmetic coding revisited. ACM Transactions on Information Systems (TOIS), 1998, 16(3): p. 256–294.

    Article  Google Scholar 

  43. Montague, M. and J. Aslam. Condorcet fusion for improved retrieval. In Proceedings of ACM International Conference on Information and knowledge management (CIKM-2002), 2002.

    Google Scholar 

  44. Ntoulas, A., M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  45. Nuray, R. and F. Can. Automatic ranking of information retrieval systems using data fusion. Information Processing & Management, 2006, 42(3): p. 595–614.

    Article  MATH  Google Scholar 

  46. Ponte, J. and W. Croft. A language modeling approach to information retrieval. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1998), 1998.

    Google Scholar 

  47. Porter, M. An algorithm for suffix stripping. Program: electronic library and information systems, 2006, 40(3): p. 211–218.

    Article  Google Scholar 

  48. Qiu, F. and J. Cho. Automatic identification of user interest for personalized search. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  49. Ramaswamy, L., A. Iyengar, L. Liu, and F. Douglis. Automatic detection of fragments in dynamically generated web pages. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.

    Google Scholar 

  50. Richardson, M., A. Prakash, and E. Brill. Beyond PageRank: machine learning for static ranking. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  51. Robertson, S., S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. NIST Special Publications, 1999: p. 253–264.

    Google Scholar 

  52. Salton, G. and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5): p. 513–523.

    Article  Google Scholar 

  53. Salton, G. and M. McGill. An Introduction to Modern Information Retrieval. 1983: McGraw-Hill.

    Google Scholar 

  54. Shen, X., B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2005), 2005.

    Google Scholar 

  55. Singhal, A. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 2001, 24(4): p. 35–43.

    Google Scholar 

  56. Song, R., H. Liu, J. Wen, and W. Ma. Learning block importance models for web pages. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.

    Google Scholar 

  57. Sun, J., X. Wang, D. Shen, H. Zeng, and Z. Chen. CWS: a comparative web search system. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  58. Williams, H. and J. Zobel. Compressing integers for fast file access. The Computer Journal, 1999, 42(3): p. 193.

    Article  Google Scholar 

  59. Witten, I., A. Moffat, and T. Bell. Managing gigabytes: compressing and indexing documents and images. 1999: Morgan Kaufmann Publishers.

    Google Scholar 

  60. Wu, B. and B. Davison. Cloaking and redirection: A preliminary study. Adversarial Information Retrieval on the Web, 2005.

    Google Scholar 

  61. Wu, B. and B. Davison. Identifying link farm spam pages. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.

    Google Scholar 

  62. Wu, B., V. Goel, and B. Davison. Topical TrustRank: Using topicality to combat web spam. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  63. Yang, B. and G. Jeh. Retroactive answering of search queries. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

  64. Yang, C. and K. Chan. Retrieving multimedia web objects based on pagerank algorithm. In WWW’05 Poster, 2005.

    Google Scholar 

  65. Yi, L., B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 2003.

    Google Scholar 

  66. Yin, X. and W. Lee. Using link analysis to improve layout on mobile devices. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.

    Google Scholar 

  67. Yu, C. and W. Meng. Principles of database query processing for advanced applications. 1998: Morgan Kaufmann Publishers.

    Google Scholar 

  68. Zhai, C. Statistical Language Model for Information Retrieval. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2001), 2001.

    Google Scholar 

  69. Zhai, C. and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 2004, 22(2): p. 179–214.

    Article  Google Scholar 

  70. Zhao, Q., S. Hoi, T. Liu, S. Bhowmick, M. Lyu, and W. Ma. Time-dependent semantic similarity measure of queries using historical click-through data. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bing Liu .

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Liu, B. (2011). Information Retrieval and Web Search. In: Web Data Mining. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19460-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19460-3_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19459-7

  • Online ISBN: 978-3-642-19460-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics