Abstract
Web search needs no introduction. Due to its convenience and the richness of information on the Web, searching the Web is increasingly becoming the dominant information seeking method. People make fewer and fewer trips to libraries, but more and more searches on the Web. In fact, without effective search engines and rich Web contents, writing this book would have been much harder.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Bibliography
Adali, S., T. Liu, and M. Magdon-Ismail. Optimal Link Bombs are Uncoordinated. In Proceedings of 1st International Workshop on Adversarial Information Retrieval on the Web, 2005.
Amitay, E., D. Carmel, A. Darlow, R. Lempel, and A. Soffer. The connectivity sonar: detecting site functionality by structural patterns. In Proceedings of ACM Conference on Hypertext and Hypermedia, 2003.
Aslam, J. and M. Montague. Models for metasearch. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2001), 2001.
Baeza-Yates, R., C. Castillo, V. López, and C. Telefónica. PageRank increase under different collusion topologies. In Proceedings of Intl. Workshop on Adversarial Information Retrieval on the Web, 2005.
Baeza-Yates, R. and B. Ribeiro-Neto. Modern information retrieval. 1999: Addison-Wesley.
Bar-Yossef, Z. and M. Gurevich. Random sampling from a search engine'sindex. Journal of the ACM (JACM), 2008, 55(5): p. 1–74.
Bar-Yossef, Z. and S. Rajagopalan. Template detection via data mining and its applications. In Proceedings of International Conference on World Wide Web (WWW-2002), 2002.
Bell, T., A. Moffat, C. Nevill-Manning, I. Witten, and J. Zobel. Data compression in full-text retrieval systems. Journal of the American Society for Information Science, 1993, 44(9): p. 508–531.
Berry, M., S. Dumais, and G. O'Brien. Using linear algebra for intelligent information retrieval. SIAM review, 1995, 37(4): p. 573–595.
Brin, S. and P. Lawrence. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 1998, 30(1–7): p. 107–117.
Cai, D., S. Yu, J. Wen, and W. Ma. Block-based web search. In Proceedings of ACM SIGIR Research and Development in Information Retrieval (SIGIR-2004), 2004.
Cai, D., S. Yu, J. Wen, and W. Ma. Extracting content structure for web pages based on visual representation. In In Processings of APWeb-2003, 2003.
Cao, Y., J. Xu, T. Liu, H. Li, Y. Huang, and H. Hon. Adapting ranking SVM to document retrieval. In Proceedings of ACM SIGIR Research and Development in Information Retrieval (SIGIR-2006), 2006.
Chakrabarti, S. Mining the Web: discovering knowledge from hypertext data. 2003: Morgan Kaufmann Publishers.
Chakrabarti, S., K. Puniyani, and S. Das. Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Chen, S. and J. Goodman. An empirical study of smoothing techniques for language modeling, 1996: Association for Computational Linguistics.
Debnath, S., P. Mitra, and C. Giles. Automatic extraction of informative blocks from webpages. In Proceedings of ACM Symposium on Applied Computing, 2005.
Deerwester, S., S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6): p. 391–407.
Deng, L., X. Chai, Q. Tan, W. Ng, and D. Lee. Spying out real user preferences for metasearch engine personalization. In Proceedings of Workshop on WebKDD, 2004.
Elias, P. Universal codeword sets and representations of the integers. Information Theory, IEEE Transactions on, 1975, 21(2): p. 194–203.
Fetterly, D., M. Manasse, and M. Najork. Detecting phrase-level duplication on the world wide web. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2005), 2005.
Fox, E. and J. Shaw. Combination of multiple searches. NIST Special Publications, 1994: p. 243–243.
Gibson, D., K. Punera, and A. Tomkins. The volume and evolution of web page templates. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.
Golomb, S. Run-length encoding. IEEE Transactions on Information Theory, 1966, 12(3): p. 399–401.
Golub, G. and C. Van Loan. Matrix computations. 1996: Johns Hopkins Univ Press.
Grossman, D.A. and O. Frieder. Information Retrieval: Algorithms and Heuristics. 2004: Springer.
Gyöngyi, Z. and H. Garcia-Molina. Link spam alliances. In Proceedings of International Conference on Very Large Data Bases (VLDB-2005), 2005: VLDB Endowment.
Gyöngyi, Z. and H. Garcia-Molina. Web spam taxonomy. In Technical Report, Stanford University, 2004.
Gyöngyi, Z., H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proceedings of International Conference on Very Large Data Bases (VLDB-2004), 2004.
Ho Kwok, S. and C. Yang. Searching the peer-to-peer networks: The community and their queries. Journal of the American Society for Information Science and Technology, 2004, 55(9): p. 783–793.
Joachims, T. Optimizing search engines using clickthrough data. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.
Jones, R., B. Rey, O. Madani, and W. Greiner. Generating query substitutions. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Kelly, J. Social choice theory: An introduction. 1988: Springer-Verlag.
Klavans, J. and S. Muresan. DEFINDER: Rule-based methods for the extraction of medical terminology and their associated definitions from online text. In Proceedings of Conference of American Medical Informatics Association, 2000.
Korn, F., H. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-1997), 1997.
Kraft, R., C. Chang, F. Maghoul, and R. Kumar. Searching with context. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Li, X., T. Phang, M. Hu, and B. Liu. Using micro information units for internet search. In Proceedings of ACM International Conference on Information and knowledge management (CIKM-2002), 2002.
Lin, S. and J. Ho. Discovering informative content blocks from Web documents. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002), 2002.
Liu, B., C. Chin, and H. Ng. Mining topic-specific concepts and definitions on the web. In Proceedings of International Conference on World Wide Web (WWW-2003), 2003.
McBryan, O. GENVL and WWWW: Tools for Taming the Web. In Proceedings of International Conference on World Wide Web (WWW-1994), 1994.
Meng, W., C. Yu, and K. Liu. Building efficient and effective metasearch engines. ACM Computing Surveys (CSUR), 2002, 34(1): p. 48–89.
Moffat, A., R. Neal, and I. Witten. Arithmetic coding revisited. ACM Transactions on Information Systems (TOIS), 1998, 16(3): p. 256–294.
Montague, M. and J. Aslam. Condorcet fusion for improved retrieval. In Proceedings of ACM International Conference on Information and knowledge management (CIKM-2002), 2002.
Ntoulas, A., M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Nuray, R. and F. Can. Automatic ranking of information retrieval systems using data fusion. Information Processing & Management, 2006, 42(3): p. 595–614.
Ponte, J. and W. Croft. A language modeling approach to information retrieval. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-1998), 1998.
Porter, M. An algorithm for suffix stripping. Program: electronic library and information systems, 2006, 40(3): p. 211–218.
Qiu, F. and J. Cho. Automatic identification of user interest for personalized search. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Ramaswamy, L., A. Iyengar, L. Liu, and F. Douglis. Automatic detection of fragments in dynamically generated web pages. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.
Richardson, M., A. Prakash, and E. Brill. Beyond PageRank: machine learning for static ranking. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Robertson, S., S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. NIST Special Publications, 1999: p. 253–264.
Salton, G. and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5): p. 513–523.
Salton, G. and M. McGill. An Introduction to Modern Information Retrieval. 1983: McGraw-Hill.
Shen, X., B. Tan, and C. Zhai. Context-sensitive information retrieval using implicit feedback. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2005), 2005.
Singhal, A. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin, 2001, 24(4): p. 35–43.
Song, R., H. Liu, J. Wen, and W. Ma. Learning block importance models for web pages. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.
Sun, J., X. Wang, D. Shen, H. Zeng, and Z. Chen. CWS: a comparative web search system. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Williams, H. and J. Zobel. Compressing integers for fast file access. The Computer Journal, 1999, 42(3): p. 193.
Witten, I., A. Moffat, and T. Bell. Managing gigabytes: compressing and indexing documents and images. 1999: Morgan Kaufmann Publishers.
Wu, B. and B. Davison. Cloaking and redirection: A preliminary study. Adversarial Information Retrieval on the Web, 2005.
Wu, B. and B. Davison. Identifying link farm spam pages. In Proceedings of International Conference on World Wide Web (WWW-2005), 2005.
Wu, B., V. Goel, and B. Davison. Topical TrustRank: Using topicality to combat web spam. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Yang, B. and G. Jeh. Retroactive answering of search queries. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Yang, C. and K. Chan. Retrieving multimedia web objects based on pagerank algorithm. In WWW’05 Poster, 2005.
Yi, L., B. Liu, and X. Li. Eliminating noisy information in web pages for data mining. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), 2003.
Yin, X. and W. Lee. Using link analysis to improve layout on mobile devices. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.
Yu, C. and W. Meng. Principles of database query processing for advanced applications. 1998: Morgan Kaufmann Publishers.
Zhai, C. Statistical Language Model for Information Retrieval. In Proceedings of ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR-2001), 2001.
Zhai, C. and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems (TOIS), 2004, 22(2): p. 179–214.
Zhao, Q., S. Hoi, T. Liu, S. Bhowmick, M. Lyu, and W. Ma. Time-dependent semantic similarity measure of queries using historical click-through data. In Proceedings of International Conference on World Wide Web (WWW-2006), 2006.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Liu, B. (2011). Information Retrieval and Web Search. In: Web Data Mining. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19460-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-19460-3_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19459-7
Online ISBN: 978-3-642-19460-3
eBook Packages: Computer ScienceComputer Science (R0)