Skip to main content

Web Spam Detection

Encyclopedia of Database Systems
  • 60 Accesses

Synonyms

Adversarial information retrieval; Google bombing; Spamdexing

Definition

Web spam refers to a host of techniques to subvert the ranking algorithms of web search engines and cause them to rank search results higher than they would otherwise. Examples of such techniques include content spam (populating web pages with popular and often highly monetizable search terms), link spam (creating links to a page in order to increase its link-based score), and cloaking (serving different versions of a page to search engine crawlers than to human users). Web spam is annoying to search engine users and disruptive to search engines; therefore, most commercial search engines try to combat web spam. Combating web spam consists of identifying spam content with high probability and – depending on policy – downgrading it during ranking, eliminating it from the index, no longer crawling it, and tainting affiliated content. The first step – identifying likely spam pages – is a classification...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Recommended Reading

  1. Becchetti L, Castillo C, Donato D, Leonardi S, Baeza-Yates R. Using rank propagation and probabilistic counting for link-based spam detection. In: Proceedings of KDD Workshop on Web Mining and Web Usage Analysis; 2006.

    Google Scholar 

  2. Castillo C, Donato D, Becchetti L, Boldi P, Leonardi S, Santini M, Vigna S. A reference collection for Web spam. ACM SIGIR Forum. 2006;40(2):11–24.

    Article  Google Scholar 

  3. Daswani N, Stoppelman M, and the Google Click Quality and Security Teams. The anatomy of clickbot.A. In: Proceedings of 1st Workshop on Hot Topics in Understanding Botnets; 2007.

    Google Scholar 

  4. Davison BD. Recognizing nepotistic links on the web. In: Proceedings of AAAI Workshop on Artificial Intelligence for Web Search; 2000.

    Google Scholar 

  5. Fetterly D, Manasse M, Najork M. Spam, damn spam and statistics. In: Proceedings of 7th Internaitonal Workshop on the Web and Databases; 2004. p. 1–6.

    Google Scholar 

  6. Gyöngyi Z, Garcia-Molina H. Spam: its not just for inboxes anymore. IEEE Comput. 2005;38(10):28–34.

    Article  Google Scholar 

  7. Gyöngyi Z, Garcia-Molina H. Web Spam Taxonomy. In: Proceedings of 1st International Workshop on Adversarial Information Retrieval on the Web; 2005. p. 39–47.

    Google Scholar 

  8. Gyöngyi Z, Garcia-Molina H, Pedersen J. Combating Web spam with TrustRank. In: Proceedings of 30th International Conference on Very Large Data Bases; 2004. p. 576–87.

    Google Scholar 

  9. Henzinger M, Motwani R, Silverstein C. Challenges in web search engines. ACM SIGIR Forum. 2002;36(2):11–22.

    Article  Google Scholar 

  10. Mishne G, Carmel D, Lempel R. Blocking blog spam with language model disagreement. In: Proceedings of 1st International Workshop on Adversarial Information Retrieval on the Web; 2005. p. 1–6.

    Google Scholar 

  11. Ntoulas A, Najork M, Manasse M, Fetterly D. Detecting spam web pages through content analysis. In: Proceedings of 15th International World Wide Web Conference; 2006. p. 83–92.

    Google Scholar 

  12. Wang YM, Ma M, Niu Y, Chen H. Spam double-funnel: connecting Web spammers with advertisers. In: Proceedings of 16th International World Wide Web Conference; 2007. p. 291–300.

    Google Scholar 

  13. Wu B, Davison B. Detecting semantic cloaking on the web. In: Proceedings of 15th International World Wide Web Conference; 2006. p. 819–28.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Najork .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this entry

Cite this entry

Najork, M. (2016). Web Spam Detection. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_465-2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4899-7993-3_465-2

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, New York, NY

  • Online ISBN: 978-1-4899-7993-3

  • eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

  1. Latest

    Web Spam Detection
    Published:
    11 February 2017

    DOI: https://doi.org/10.1007/978-1-4899-7993-3_465-3

  2. Original

    Web Spam Detection
    Published:
    18 November 2016

    DOI: https://doi.org/10.1007/978-1-4899-7993-3_465-2