Scalable Online Incremental Learning for Web Spam Detection

Han, Liangxiu; Levenberg, Abby

doi:10.1007/978-3-642-25781-0_35

Liangxiu Han⁶ &
Abby Levenberg⁷

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 124))

190 Accesses
7 Citations

Abstract

In this paper, we propose an online incremental learning framework for identifying web spam. The proposed work can incrementally update the learning model based on any newly arrived samples without recourse to the original data. The prototype of the framework has been evaluated with a real large scale web spam dataset. The results demonstrate the proposed online detector has high learning speed and accurate prediction rates for the web spam.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., Vigna, S.: ACM SIGIR Forum 40(2), 11 (2006)
Article Google Scholar
Benczur, R.A., Csalogany, K., Sarlos, T., Uher, M.: Proceedings of the 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb), held in conjunction with WWW 2005 (2005)
Google Scholar
Zhou, B., Pei, J.: ACM Transactions on Knowledge Discovery from Data 3(3) (2009)
Google Scholar
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: In: International World Wide Web Conference: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92 (2006)
Google Scholar
Fetterly, D., Manasse, M., Najork, M.: 7th International Workshop on the Web and Databases (2004)
Google Scholar
Dagan, I., Karor, Y., Roth, D.: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 55–63 (1997)
Google Scholar
Bekkerman, R., McCallum, A., Huang, G.: Categorization of email into folders: Bench- mark experiments on enron and sri corpora. Ciir technical report ir-418, CIIR, University of Massachusetts (2004)
Google Scholar
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: The Journal of Machine Learning Research, 551–585 (2006)
Google Scholar
Carvalho, V.R., Cohen, W.W.: in. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 548–553. ACM (2006)
Google Scholar
Yahoo! Research: ”Web Spam Collections”. Crawled by the Laboratory of Web Algorithmics, University of Milan URLs, http://law.dsi.unimi.it/ (retrieved on July 12, 2010)
Levenberg, A., Osborne, M.: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP (2009)
Google Scholar
Geng, G.G., Wang, C.H., Li, Q.D., Xu, L., Jin, X.B.: Fourth International Conference on Fuzzy Systems and Knowledge Discovery, FSKD 2007, vol. 4, pp. 583–587 (2007)
Google Scholar
Mortensen, C.W., Pagh, R., Pǎtraçcu, M.: STOC 2005: Proceedings of the Thirty-Seventh Annual ACM Symposium on Theory of Computing, pp. 104–111. ACM (2005)
Google Scholar
Bradley, A.P.: Pattern Recognition, 1145–1159 (1997)
Google Scholar
Vanderlooy, S., Hüllermeier, E.: Machine Learning, 247–262 (2008)
Google Scholar
Hanley, J.A., McNeil, B.J.: Radiology (1982)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Mathematics and Digital Technology, Manchester Metropolitan University, Manchester, UK
Liangxiu Han
School of Informatics, University of Edinburgh, Edinburgh, UK
Abby Levenberg

Authors

Liangxiu Han
View author publications
You can also search for this author in PubMed Google Scholar
Abby Levenberg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Communication Engineering, Jilin University, Room 313, Building No.1, Changchun, Nanhu Avenue 5372, 130012, Jilin, China, People’s Republic
Zhihong Qian
Department of Electrical Engineering, The University of Mississippi, Anderson Hall 314, 38677, Mississippi, Mississippi, USA
Lei Cao
Department of Electrical and Computer Eng., Naval Postgraduate School, Rm. 452 Spanagel Bldg. 232, Dyer Road 833, 93943-5121, Monterey, California, USA
Weilian Su
Faculty of Computing, London Metropolitan University, Holloway Road 166-220, N7 8DB, London, United Kingdom
Tingkai Wang
College of Software, Changchun University of Science and Tech., Changchun, 130022, Jilin, China, People’s Republic
Huamin Yang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Han, L., Levenberg, A. (2012). Scalable Online Incremental Learning for Web Spam Detection. In: Qian, Z., Cao, L., Su, W., Wang, T., Yang, H. (eds) Recent Advances in Computer Science and Information Engineering. Lecture Notes in Electrical Engineering, vol 124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25781-0_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-25781-0_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25780-3
Online ISBN: 978-3-642-25781-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics