Skip to main content

Web Spam Detection

  • Reference work entry
  • First Online:
  • 10 Accesses

Synonyms

Adversarial information retrieval; Google bombing; Spamdexing

Definition

Web spam refers to a host of techniques to subvert the ranking algorithms of web search engines and cause them to rank search results higher than they would otherwise. Examples of such techniques include content spam (populating web pages with popular and often highly monetizable search terms), link spam (creating links to a page in order to increase its link-based score), and cloaking (serving different versions of a page to search engine crawlers than to human users). Web spam is annoying to search engine users and disruptive to search engines; therefore, most commercial search engines try to combat web spam. Combating web spam consists of identifying spam content with high probability and – depending on policy – downgrading it during ranking, eliminating it from the index, no longer crawling it, and tainting affiliated content. The first step – identifying likely spam pages – is a classification...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R. Using rank propagation and probabilistic counting for link-based spam detection. In: Proceedings of the KDD Workshop on Web Mining and Web Usage Analysis; 2006.

    Google Scholar 

  2. Castillo C, Donato D, Becchetti L, Boldi P, Leonardi S, Santini M, Vigna S. A reference collection for Web spam. ACM SIGIR Forum. 2006;40(2):11–24.

    Article  Google Scholar 

  3. Daswani N, Stoppelman M, and the Google Click Quality and Security Teams. The anatomy of clickbot.A. In: Proceedings of the 1st Workshop on Hot Topics in Understanding Botnets; 2007.

    Google Scholar 

  4. Davison BD. Recognizing nepotistic links on the web. In: Proceedings of the AAAI Workshop on Artificial Intelligence for Web Search; 2000.

    Google Scholar 

  5. Fetterly D, Manasse M, Najork M. Spam, damn spam and statistics. In: Proceedings of the 7th Internaitonal Workshop on the Web and Databases; 2004. p. 1–6.

    Google Scholar 

  6. Gyöngyi Z, Garcia-Molina H. Spam: its not just for inboxes anymore. IEEE Comput. 2005;38(10):28–34.

    Article  Google Scholar 

  7. Gyöngyi Z, Garcia-Molina H. Web Spam Taxonomy. In: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web; 2005. p. 39–47.

    Google Scholar 

  8. Gyöngyi Z, Garcia-Molina H, Pedersen J. Combating Web spam with TrustRank. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 576–87.

    Chapter  Google Scholar 

  9. Henzinger M, Motwani R, Silverstein C. Challenges in web search engines. ACM SIGIR Forum. 2002;36(2):11–22.

    Article  Google Scholar 

  10. Mishne G, Carmel D, Lempel R. Blocking blog spam with language model disagreement. In: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web; 2005. p. 1–6.

    Google Scholar 

  11. Ntoulas A, Najork M, Manasse M, Fetterly D. Detecting spam web pages through content analysis. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 83–92.

    Google Scholar 

  12. Wang YM, Ma M, Niu Y, Chen H. Spam double-funnel: connecting Web spammers with advertisers. In: Proceedings of the 16th International World Wide Web Conference; 2007. p. 291–300.

    Google Scholar 

  13. Wu B, Davison B. Detecting semantic cloaking on the web. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 819–28.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Najork .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Najork, M. (2018). Web Spam Detection. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_465

Download citation

Publish with us

Policies and ethics