Skip to main content

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 124))

Abstract

In this paper, we propose an online incremental learning framework for identifying web spam. The proposed work can incrementally update the learning model based on any newly arrived samples without recourse to the original data. The prototype of the framework has been evaluated with a real large scale web spam dataset. The results demonstrate the proposed online detector has high learning speed and accurate prediction rates for the web spam.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., Vigna, S.: ACM SIGIR Forum 40(2), 11 (2006)

    Article  Google Scholar 

  2. Benczur, R.A., Csalogany, K., Sarlos, T., Uher, M.: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), held in conjunction with WWW 2005 (2005)

    Google Scholar 

  3. Zhou, B., Pei, J.: ACM Transactions on Knowledge Discovery from Data 3(3) (2009)

    Google Scholar 

  4. Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: In: International World Wide Web Conference: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92 (2006)

    Google Scholar 

  5. Fetterly, D., Manasse, M., Najork, M.: 7th International Workshop on the Web and Databases (2004)

    Google Scholar 

  6. Dagan, I., Karor, Y., Roth, D.: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 55–63 (1997)

    Google Scholar 

  7. Bekkerman, R., McCallum, A., Huang, G.: Categorization of email into folders: Bench- mark experiments on enron and sri corpora. Ciir technical report ir-418, CIIR, University of Massachusetts (2004)

    Google Scholar 

  8. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: The Journal of Machine Learning Research, 551–585 (2006)

    Google Scholar 

  9. Carvalho, V.R., Cohen, W.W.: in. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 548–553. ACM (2006)

    Google Scholar 

  10. Yahoo! Research: ”Web Spam Collections”. Crawled by the Laboratory of Web Algorithmics, University of Milan URLs, http://law.dsi.unimi.it/ (retrieved on July 12, 2010)

  11. Levenberg, A., Osborne, M.: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2009)

    Google Scholar 

  12. Geng, G.G., Wang, C.H., Li, Q.D., Xu, L., Jin, X.B.: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, vol. 4, pp. 583–587 (2007)

    Google Scholar 

  13. Mortensen, C.W., Pagh, R., Pǎtraçcu, M.: STOC 2005: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, pp. 104–111. ACM (2005)

    Google Scholar 

  14. Bradley, A.P.: Pattern Recognition, 1145–1159 (1997)

    Google Scholar 

  15. Vanderlooy, S., Hüllermeier, E.: Machine Learning, 247–262 (2008)

    Google Scholar 

  16. Hanley, J.A., McNeil, B.J.: Radiology (1982)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Han, L., Levenberg, A. (2012). Scalable Online Incremental Learning for Web Spam Detection. In: Qian, Z., Cao, L., Su, W., Wang, T., Yang, H. (eds) Recent Advances in Computer Science and Information Engineering. Lecture Notes in Electrical Engineering, vol 124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25781-0_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25781-0_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25780-3

  • Online ISBN: 978-3-642-25781-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics