Content Trust Model for Detecting Web Spam

Wang, Wei; Zeng, Guosun

doi:10.1007/978-0-387-73655-6_10

Wei Wang^3,4 &
Guosun Zeng^3,4

Part of the book series: IFIP International Federation for Information Processing ((IFIPAICT,volume 238))

Included in the following conference series:

IFIP International Conference on Trust Management

830 Accesses
4 Citations

Abstract

As it gets easier to add information to the web via html pages, wikis, blogs, and other documents, it gets tougher to distinguish accurate or trustworthy information from inaccurate or untrustworthy information. Moreover, apart from inaccurate or untrustworthy information, we also need to anticipate web spam — where spammers publish false facts and scams to deliberately mislead users. Creating an effective spam detection method is a challenge. In this paper, we use the notion of content trust for spam detection, and regard it as a ranking problem. Evidence is utilized to define the feature of spam web pages, and machine learning techniques are employed to combine the evidence to create a highly efficient and reasonably-accurate spam detection algorithm. Experiments on real web data are carried out, which show the proposed method performs very well in practice.

Please use the following format when citing this ehupter: Wang, W. and Zeng, G., 2007, in IFIP international Federation for Information Processing, Volume 238, Trust Management, eds. Etalle, S., Marsh, S., (Boston: Springer), pp. 139–152

Download to read the full chapter text

Chapter PDF

A Content-Aware Trust Index for Online Review Spam Detection

FlexTrustRank: A New Approach to Link Spam Combating

Detecting spam web pages using content and link-based techniques

Article 01 February 2016

Key words

References

B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer-Verlag Berlin Heidelberg, (2007)
MATH Google Scholar
A. Ntoulas, M. Najork, M. Manasse, et al., Detecting Spam Web Pages through Content Analysis. In Proceedings of the 15th International World Wide Web Conference (WWW’06), May 23–26, Edinburgh, Scotland, (2006)
Google Scholar
Y. Gil, D. Artz, Towards Content Trust of Web Resources. In Proceedings of the 15th International World Wide Web Conference (WWW’06), May 23–26, Edinburgh, Scotland, (2006)
Google Scholar
D. Fetterly, M. Manasse, M. Najork, Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages. In 7th International Workshop on the Web and Databases, (2004)
Google Scholar
Z. Gyongyi, H. Garcia-Molina, Web Spam Taxonomy. In 1st International Workshop on Adversarial Information Retrieval on the Web, May (2005)
Google Scholar
B. Davison, Recognizing Nepotistic Links on the Web. In AAAI-2000 Workshop on Artificial Intelligence for Web Search, July (2000)
Google Scholar
R. Baeza-Yates, C. Castillo, V. Liopez, PageRank Increase under Different Collusion Topologies. In 1st International Workshop on Adversarial Information Retrieval on the Web, May (2005)
Google Scholar
L. Page, S. Brin, et al., The PageRank Citation Ranking: Bringing Order to the Web. Stanford Digital Library Technologies Project, (1998)
Google Scholar
S. Adali, T. Liu, M. Magdon-Ismail, Optimal Link Bombs are Uncoordinated. In 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb’05), May (2005)
Google Scholar
Z. Gyongyi, H. Garcia-Molina, J. Pedersen, Combating Web Spam with TrustRank. In 30th International Conference on Very Large Data Bases, Aug. (2004)
Google Scholar
G. Mishne, D. Carmel, R. Lempel, Blocking Blog Spam with Language Model Disagreement. In 1st International Workshop on Adversarial Information Retrieval on the Web, May (2005)
Google Scholar
C. Castillo, D. Donato, L. Becchett, et al., A Reference Collection for Web Spam. SIGIR Forum, 40(2), 11–24 (2006)
Article Google Scholar
Y. B Cao, J. Xu, T. Y.Liu et al., Adapting Ranking SVM to Document Retrieval, In Proceedings of the 29th Annual International ACM SIGIR Conference On Research and Development in Information Retrieval, 186–193 (2006)
Google Scholar
R. Herbrich, T. Graepel, K. Obermayer, Large Margin Rank Boundaries for Ordinal Regression. Advances in Large Margin Classifiers, 115–132 (2000)
Google Scholar
B. Wu, V. Goel, B. D. Davison, Topical TrustRank: Using Topicality to Combat Web Apam. In Proceedings of the 15th International World Wide Web Conference (WWW’06), May 23–26, Edinburgh, Scotland, (2006)
Google Scholar
Z. Gyiongyi, P. Berkhin, H. Garcia-Molina, et al, Link Spam Detection Based on Mass Estimation, In Proceedings of the 32nd International Conference on Very Large Databases (VLDB’06), (2006)
Google Scholar
P. T. Metaxas, J. DeStefano, Web Spam, Propaganda and Trust, In 1st International Workshop on Adversarial Information Retrieval on the Web (AIRWeb’05), May (2005)
Google Scholar
L. Nie, B. Wu and B. D. Davison. Incorporating Trust into Web Search. Technical Report LU-CSE-07-002, Dept. of Computer Science and Engineering, Lehigh University, (2007)
Google Scholar
S. D. Kamvar, M. T. Schlosser, H. Garcia-Molina, The Eigentrust Algorithm for Reputation Management in P2P Networks. In Proceedings of the 12th International World Wide Web Conference (WWW’03), Budapest, Hungary, May (2003)
Google Scholar
R. Guha, R. Kumar, P. Raghavan, and A. Tomkins. Propagation of Trust and Distrust. In Proceedings of the 13th International World Wide Web Conference (WWW’04), New York City, May (2004)
Google Scholar
W. Wang, G. S. Zeng, L. L. Yuan, A Semantic Reputation Mechanism in P2P Semantic Web, In Proceedings of the 1st Asian Semantic Web Conference (ASWC), LNCS 4185, 682–688 (2006)
Google Scholar
W. Wang, G. S. Zeng, Trusted Dynamic Level Scheduling Based on Bayes Trust Model. Science in China: Series F Information Sciences, 37(2), 285–296 (2007)
Google Scholar
F. J. Provost, P. Domingos, Tree Induction for Probability-Based Ranking. Machine Learning, 52(3), 199–215 (2003)
Article MATH Google Scholar
H. Zhang, J. Su, Naive Bayesian Classifiers for Ranking, Proceedings of the 15th European Conference on Machine Learning (ECML’04), Springer (2004)
Google Scholar
I. H. Witten, E. Frank, Data Mining — Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, (2000)
Google Scholar
F. Provost, T. Fawcett, Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distribution. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, AAAI Press, 43–48 (1997)
Google Scholar
Y. Freund, R. E. Schapire, A Decision-theoretic Generalization of On-line Learning and an Application to Boosting. In European Conference on Computational Learning Theory, (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai, 201804, China
Wei Wang & Guosun Zeng
Tongji Branch, National Engineering & Technology Center of High Performance Computer, Shanghai, 201804, China
Wei Wang & Guosun Zeng

Authors

Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guosun Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Distributed and Embedded System Group, University of Twente, The Netherlands
Sandro Etalle
Information Security Group Institute for Information Technology, National Research Council, Canada
Stephen Marsh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, W., Zeng, G. (2007). Content Trust Model for Detecting Web Spam. In: Etalle, S., Marsh, S. (eds) Trust Management. IFIPTM 2007. IFIP International Federation for Information Processing, vol 238. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-73655-6_10

Download citation

DOI: https://doi.org/10.1007/978-0-387-73655-6_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-73654-9
Online ISBN: 978-0-387-73655-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Content Trust Model for Detecting Web Spam

Abstract

Chapter PDF

Similar content being viewed by others

A Content-Aware Trust Index for Online Review Spam Detection

FlexTrustRank: A New Approach to Link Spam Combating

Detecting spam web pages using content and link-based techniques

Key words

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Content Trust Model for Detecting Web Spam

Abstract

Chapter PDF

Similar content being viewed by others

A Content-Aware Trust Index for Online Review Spam Detection

FlexTrustRank: A New Approach to Link Spam Combating

Detecting spam web pages using content and link-based techniques

Key words

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation