Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Web Spam Detection

  • Marc Najork
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_465

Synonyms

Adversarial information retrieval; Google bombing; Spamdexing

Definition

Web spam refers to a host of techniques to subvert the ranking algorithms of web search engines and cause them to rank search results higher than they would otherwise. Examples of such techniques include content spam (populating web pages with popular and often highly monetizable search terms), link spam (creating links to a page in order to increase its link-based score), and cloaking (serving different versions of a page to search engine crawlers than to human users). Web spam is annoying to search engine users and disruptive to search engines; therefore, most commercial search engines try to combat web spam. Combating web spam consists of identifying spam content with high probability and – depending on policy – downgrading it during ranking, eliminating it from the index, no longer crawling it, and tainting affiliated content. The first step – identifying likely spam pages – is a classification...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R. Using rank propagation and probabilistic counting for link-based spam detection. In: Proceedings of the KDD Workshop on Web Mining and Web Usage Analysis; 2006.Google Scholar
  2. 2.
    Castillo C, Donato D, Becchetti L, Boldi P, Leonardi S, Santini M, Vigna S. A reference collection for Web spam. ACM SIGIR Forum. 2006;40(2):11–24.CrossRefGoogle Scholar
  3. 3.
    Daswani N, Stoppelman M, and the Google Click Quality and Security Teams. The anatomy of clickbot.A. In: Proceedings of the 1st Workshop on Hot Topics in Understanding Botnets; 2007.Google Scholar
  4. 4.
    Davison BD. Recognizing nepotistic links on the web. In: Proceedings of the AAAI Workshop on Artificial Intelligence for Web Search; 2000.Google Scholar
  5. 5.
    Fetterly D, Manasse M, Najork M. Spam, damn spam and statistics. In: Proceedings of the 7th Internaitonal Workshop on the Web and Databases; 2004. p. 1–6.Google Scholar
  6. 6.
    Gyöngyi Z, Garcia-Molina H. Spam: its not just for inboxes anymore. IEEE Comput. 2005;38(10):28–34.CrossRefGoogle Scholar
  7. 7.
    Gyöngyi Z, Garcia-Molina H. Web Spam Taxonomy. In: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web; 2005. p. 39–47.Google Scholar
  8. 8.
    Gyöngyi Z, Garcia-Molina H, Pedersen J. Combating Web spam with TrustRank. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 576–87.CrossRefGoogle Scholar
  9. 9.
    Henzinger M, Motwani R, Silverstein C. Challenges in web search engines. ACM SIGIR Forum. 2002;36(2):11–22.CrossRefGoogle Scholar
  10. 10.
    Mishne G, Carmel D, Lempel R. Blocking blog spam with language model disagreement. In: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web; 2005. p. 1–6.Google Scholar
  11. 11.
    Ntoulas A, Najork M, Manasse M, Fetterly D. Detecting spam web pages through content analysis. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 83–92.Google Scholar
  12. 12.
    Wang YM, Ma M, Niu Y, Chen H. Spam double-funnel: connecting Web spammers with advertisers. In: Proceedings of the 16th International World Wide Web Conference; 2007. p. 291–300.Google Scholar
  13. 13.
    Wu B, Davison B. Detecting semantic cloaking on the web. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 819–28.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Google, Inc.Mountain ViewUSA

Section editors and affiliations

  • Cong Yu
    • 1
  1. 1.Google ResearchNew YorkUSA