Abstract
The main target for the current crawler system lack the ability of detecting (Web Spam Detection) capacity, which is the primary limitation for further improvement of their performance. In order to supply a want, the topic crawler algorithm based on anti Spoofing is proposed. The design goal of topic crawler is to gather more relevant to subject pages with limited resources, and minimize the likelihood of the irrelevant page. And the algorithm enables the topic crawlers to the function of the ant spam, improves the correlation of the pages downloaded by the topic crawlers, and enhances the adaptability of the crawlers. And the algorithm’s effectiveness has been verified by experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhu L (2008) Rearch and design about Topic crawler on Web. Nanjing Univ Sci 7(3):11–13
Zhou X, Zhang HX (2008) An algorithm of text categorization based on similar rough set and fuzzy cognitive map. In: Proceedings of the 5th international conference on fuzzy systems and knowledge discovery, Jinan, China, vol 40(4), pp 34–36
Belkin M, Niyogi P, Sindhwani V (2005) On manifold regularization. In: Proceedings of the 10th international workshop on artifcial intelligence and statistics (AISTATS), vol 22(18), pp 6–7
Zhang T, Popescul A, Dom B (2006) Linear prediction models with graph regularization for web-page categorization. In: KDD 06: Proceedings of the 12th ACM SIGKDD interactional conference on knowledge discovery and data mining, vol 34(5), pp 821–826
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this paper
Cite this paper
Jia, X. (2013). Antispam Topic Crawler Algorithm Based on Anti Spoofing. In: Du, W. (eds) Informatics and Management Science I. Lecture Notes in Electrical Engineering, vol 204. Springer, London. https://doi.org/10.1007/978-1-4471-4802-9_20
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4802-9_20
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4801-2
Online ISBN: 978-1-4471-4802-9
eBook Packages: EngineeringEngineering (R0)