EviRank: An Evidence Based Content Trust Model for Web Spam Detection

Wang, Wei; Zeng, Guosun; Sun, Mingjun; Gu, Huanan; Zhang, Quan

doi:10.1007/978-3-540-72909-9_34

Wei Wang¹,
Guosun Zeng¹,
Mingjun Sun¹,
Huanan Gu¹ &
…
Quan Zhang¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4537))

Included in the following conference series:

2426 Accesses
1 Citations

Abstract

Creating an effective spam detection method is a challenging task. Traditional works usually regard this kind of work as a problem of binary classification. In this paper, however, we argue that it is more property to use the notion of content trust for it, and regard it as a ranking or ordinal regression problem. Evidence is utilized to define the feature of spam web pages, and machine learning techniques are employed to combine the evidence to create a highly efficient and reasonably-accurate detection algorithm. Experiments on real web data are carried out, which improve the proposed method performs very well in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fetterly, D., Manasse, M., Najork, M.: Spam, Damn Spam, and Statistics: Using statistical analysis to locate spam web pages. In: 7th International Workshop on the Web and Databases (2004)
Google Scholar
Ntoulas, A., Najork, M., Manasse, M., et al.: Detecting Spam Web Pages through Content Analysis. In: proceedings of WWW 2006, May 23–26, Edinburgh, Scotland (2006)
Google Scholar
Wang, W., Zeng, G. S., Liu, T.: An Autonomous Trust Construction System Based on Bayesian Method, In: Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology (IAT 2006), Hong Kong, China, pp. 357–362 (December18-22 2006)
Google Scholar
Gyongyi, Z., Garcia-Molina, H.: Web Spam Taxonomy. In: 1st International Workshop on Adversarial Information Retrieval on the Web (May 2005)
Google Scholar
Davison, B.: Recognizing Nepotistic Links on the Web. In AAAI-2000 Workshop on Artificial Intelligence for Web Search (July 2000)
Google Scholar
Baeza-Yates, R., Castillo, C., Liopez, V.: PageRank Increase under Different Collusion Topologies. In: 1st International Workshop on Adversarial Information Retrieval on the Web (May 2005)
Google Scholar
Page, L., Brin, S., et al.: The PageRank Citation Ranking: Bringing Order to the Web. Stanford Digital Library Technologies Project (1998)
Google Scholar
Adali, S., Liu, T., Magdon-Ismail, M.: Optimal Link Bombs are Uncoordinated. In: 1st International Workshop on Adversarial Information Retrieval on the Web (May 2005)
Google Scholar
Gyiongyi, Z., Garcia-Molina, H., Pedersen, J.: Combating Web Spam with TrustRank. In: 30th International Conference on Very Large Data Bases (August 2004)
Google Scholar
Mishne, G., Carmel, D., Lempel, R.: Blocking Blog Spam with Language Model Disagreement. In: 1st International Workshop on Adversarial Information Retrieval on the Web (May 2005)
Google Scholar
Cao, Y. B., Xu, J., Liu, T. Y., et al.: Adapting Ranking SVM to Document Retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference On Research and Development in Information Retrieval, pp. 186–193 (2006)
Google Scholar
Herbrich, R., Graepel, T., Obermayer, K.: Large Margin Rank Boundaries for Ordinal Regression. Advances in Large Margin Classifiers, pp. 115–132 (2000)
Google Scholar
Wang, W., Zeng, G.S., Yuan, L.L.: A Semantic Reputation Mechanism in P2P Semantic Web. In: Mizoguchi, R., Shi, Z., Giunchiglia, F. (eds.) ASWC 2006. LNCS, vol. 4185, pp. 682–688. Springer, Heidelberg (2006)
Chapter Google Scholar
Zhang, H., Su, J.: Naive Bayesian classifiers for ranking. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, Springer, Heidelberg (2004)
Google Scholar
Provost, F.J., Domingos, P.: Tree Induction for Probability-Based Ranking. Ma.-chine Learning 52(3), 199–215 (2003)
MATH Google Scholar
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: comparison under imprecise class and cost distribution. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 43–48. AAAI Press, California (1997)
Google Scholar
Witten, I.H., Frank, E.: Data Mining–Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, Washington (2000)
Google Scholar
Gil, Y., Artz, D.: Towards content trust of web resources. In: Proceedings of the 15th International World Wide Web Conference (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai 201804, China, Tongji Branch, National Engineering & Technology Center of, High Performance Computer, Shanghai 201804, China, The Key Laboratory of Embedded System and Service Computing, Ministry of Education, Email: willtongji@gmail.com,
Wei Wang, Guosun Zeng, Mingjun Sun, Huanan Gu & Quan Zhang

Authors

Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guosun Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Mingjun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Huanan Gu
View author publications
You can also search for this author in PubMed Google Scholar
Quan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Kevin Chen-Chuan Chang Wei Wang Lei Chen Clarence A. Ellis Ching-Hsien Hsu Ah Chung Tsoi Haixun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Zeng, G., Sun, M., Gu, H., Zhang, Q. (2007). EviRank: An Evidence Based Content Trust Model for Web Spam Detection. In: Chang, K.CC., et al. Advances in Web and Network Technologies, and Information Management. APWeb WAIM 2007 2007. Lecture Notes in Computer Science, vol 4537. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72909-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-540-72909-9_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72908-2
Online ISBN: 978-3-540-72909-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics