Synonyms
Adversarial information retrieval; Google bombing; Spamdexing
Definition
Web spam refers to a host of techniques to subvert the ranking algorithms of web search engines and cause them to rank search results higher than they would otherwise. Examples of such techniques include content spam (populating web pages with popular and often highly monetizable search terms), link spam (creating links to a page in order to increase its link-based score), and cloaking (serving different versions of a page to search engine crawlers than to human users). Web spam is annoying to search engine users and disruptive to search engines; therefore, most commercial search engines try to combat web spam. Combating web spam consists of identifying spam content with high probability and – depending on policy – downgrading it during ranking, eliminating it from the index, no longer crawling it, and tainting affiliated content. The first step – identifying likely spam pages – is a classification...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R. Using rank propagation and probabilistic counting for link-based spam detection. In: Proceedings of the KDD Workshop on Web Mining and Web Usage Analysis; 2006.
Castillo C, Donato D, Becchetti L, Boldi P, Leonardi S, Santini M, Vigna S. A reference collection for Web spam. ACM SIGIR Forum. 2006;40(2):11–24.
Daswani N, Stoppelman M, and the Google Click Quality and Security Teams. The anatomy of clickbot.A. In: Proceedings of the 1st Workshop on Hot Topics in Understanding Botnets; 2007.
Davison BD. Recognizing nepotistic links on the web. In: Proceedings of the AAAI Workshop on Artificial Intelligence for Web Search; 2000.
Fetterly D, Manasse M, Najork M. Spam, damn spam and statistics. In: Proceedings of the 7th Internaitonal Workshop on the Web and Databases; 2004. p. 1–6.
Gyöngyi Z, Garcia-Molina H. Spam: its not just for inboxes anymore. IEEE Comput. 2005;38(10):28–34.
Gyöngyi Z, Garcia-Molina H. Web Spam Taxonomy. In: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web; 2005. p. 39–47.
Gyöngyi Z, Garcia-Molina H, Pedersen J. Combating Web spam with TrustRank. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 576–87.
Henzinger M, Motwani R, Silverstein C. Challenges in web search engines. ACM SIGIR Forum. 2002;36(2):11–22.
Mishne G, Carmel D, Lempel R. Blocking blog spam with language model disagreement. In: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web; 2005. p. 1–6.
Ntoulas A, Najork M, Manasse M, Fetterly D. Detecting spam web pages through content analysis. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 83–92.
Wang YM, Ma M, Niu Y, Chen H. Spam double-funnel: connecting Web spammers with advertisers. In: Proceedings of the 16th International World Wide Web Conference; 2007. p. 291–300.
Wu B, Davison B. Detecting semantic cloaking on the web. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 819–28.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Najork, M. (2018). Web Spam Detection. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_465
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_465
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering