Skip to main content

Adaptive Learning Ant Colony Optimization for Web Spam Detection

  • Conference paper
Computational Science and Its Applications – ICCSA 2014 (ICCSA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8584))

Included in the following conference series:

Abstract

Web spamming is nowadays a serious problem for search engines. It not only degrades the quality of search results by intentionally boosting undesirable web pages to users, but also causes the search engine to waste a significant amount of computational and storage resources in manipulating useless information. In this paper, we present a machine learning approach for spam detection by adopting the ant colony optimization algorithm. We first construct a directed graph corresponding to web hosts and their aggregated hyperlinks. Then, we train a classifier by employing ants to walk along paths in the graph. Each ant will start from an individual non-spam host and afterwards decides to follow a link to the next host with a probability based on both heuristic function and pheromone trail. Relying on the approximate isolation principle of a good set, we reward an ant that can discover a good path, i.e., a sequence of non-spam hosts, by charging energy for its longer walking. In contrast, if the ant instead discovers any spam, it will be penalized by decreasing its walking step. Finally, the classification rules are constructed by choosing common overlapping characteristic features of all non-spam hosts along the discovered paths. Experiments on WEBSPAM-UK2007 dataset show that our approach contributes to more accurately classify spam and non-spam hosts than several rule-based classification baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Araujo, L., Martinez-Romo, J.: Web spam detection: New classification features based on qualified link analysis and language models. IEEE Transactions on Information Forensics and Security 5(3), 581–590 (2010)

    Article  Google Scholar 

  2. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. Addison Wesley, England (1999)

    Google Scholar 

  3. Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Link-based characterization and detection of web spam. In: Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web, pp. 1–8 (2006)

    Google Scholar 

  4. Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Web spam detection: Link-based and content-based techniques. In: The European Integrated Project Dynamically Evolving, Large Scale Information Systems (DELIS): Proceedings of the Final Workshop, vol. 222, pp. 99–113 (2008)

    Google Scholar 

  5. Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., Vigna, S.: A reference collection for web spam. ACM SIGIR Forum 40(2), 11–24 (2006)

    Article  Google Scholar 

  6. Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: Web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 423–430 (2007)

    Google Scholar 

  7. Dorigo, M., Di Caro, G., Gambardella, L.M.: Ant algorithms for discrete optimization. Artificial Life 5(2), 137–172 (1999)

    Article  Google Scholar 

  8. Dorigo, M., Gambardella, L.M.: Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation 1(1), 53–66 (1997)

    Article  Google Scholar 

  9. Dorigo, M., Maniezzo, V., Colorni, A.: Ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics 26(1), 29–41 (1996)

    Article  Google Scholar 

  10. Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)

    Google Scholar 

  11. Fetterly, D., Manasse, H., Najork, M.: Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In: Proceedings of the 7th International Workshop on the Web and Databases, pp. 1–6 (2004)

    Google Scholar 

  12. Geng, G.G., Jin, X.B., Wang, C.H.: Casia at web spam challenge 2008 track iii. In: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web (2008)

    Google Scholar 

  13. Gyöngyi, Z., Garcia-Molina, H.: Web spam taxonomy. In: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web, pp. 39–47 (2005)

    Google Scholar 

  14. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the 13th International Conference on Very Large Data Bases, pp. 576–587 (2004)

    Google Scholar 

  15. Henzinger, M.R., Motwani, R., Silverstein, C.: Challenges in web search engines. ACM SIGIR Forum 36(2), 11–22 (2002)

    Article  Google Scholar 

  16. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  17. Krishnan, V., Raj, R.: Web spam detection with anti-trust rank. In: Proceedings of the 2nd International Workshop on Adversarial Information Retrieval on the Web, pp. 37–40 (2006)

    Google Scholar 

  18. Liu, Y., Gao, B., Liu, T.Y., Zhang, Y., Ma, Z., He, S., Li, H.: Browserank: Letting web users vote for page importance. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 451–458 (2008)

    Google Scholar 

  19. Liu, Y., Zhang, M., Ma, S., Ru, L.: User behavior oriented web spam detection. In: Proceedings of the 17th International Conference on World Wide Web, pp. 1039–1040 (2008)

    Google Scholar 

  20. Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92 (2006)

    Google Scholar 

  21. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Libraries (1999)

    Google Scholar 

  22. Parpinelli, R.S., Lopes, H.S., Freitas, A.A.: Data mining with an ant colony optimization algorithm. IEEE Transactions on Evolutionary Computation 6(4), 321–332 (2002)

    Article  Google Scholar 

  23. Stützle, T., Hoos, H.H.: \(\mathcal{MAX\mbox{-}MIN}\) ant system. Future Generation Computer Systems 16(9), 889–914 (2000)

    Article  Google Scholar 

  24. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    Google Scholar 

  25. Wu, B., Davison, B.D.: Identifying link farm spam pages. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 820–829 (2005)

    Google Scholar 

  26. Wu, B., Goel, V., Davison, B.D.: Propagating trust and distrust to demote web spam. In: Proceedings of the Workshop on Models of Trust for the Web (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Manaskasemsak, B., Jiarpakdee, J., Rungsawang, A. (2014). Adaptive Learning Ant Colony Optimization for Web Spam Detection. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8584. Springer, Cham. https://doi.org/10.1007/978-3-319-09153-2_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09153-2_48

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09152-5

  • Online ISBN: 978-3-319-09153-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics