Abstract
In the process of Internet information recycles, especially in the application of resource download, we need to judge whether a webpage is updated or not. So we can decide the resource that whether it needs to be downloaded or not. In this paper we put forward an algorithm about the webpage update detection which is based on the webpage’s body text. This algorithm is based on extracting Chinese text feature and judges whether a webpage need to be updated or not by analyzing the feature. The result shows that this method has high detection rate and quick progressing speed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Elhadi M, Al-Tobi A (2009) Webpage duplicate detection using combined POS and sequence alignment algorithm. In: 2009 WRI world congress on computer science and information engineering, vol 76, pp 630–634
Liu KY, Zheng JH (2002) Research of automatic Chinese word segmentation. Proc Int Conf Mach Learn Cybern 55(2):805–809
Abudoulikemu Y (2010) The research and application of the Chinese machinery word segmentation algorithm based on improved patricia tree dictionary. In: 2nd international conference on signal processing systems (ICSPS), 2010, vol 54, pp 341–345
Wang FL, Yang CC (2007) Mining web data for Chinese segmentation. J Am Soc Inform Sci Technol 58(12):1820–1837
Ma WY (2007) Effective analysis of Chinese word-segmentation accuracy. Mod Electron Technol 4(243):108–111
Acknowledgments
Thanks for sponsors of, 2009BAH40B04, CNGI-09-03-15 and NCET-09-0708.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this paper
Cite this paper
Chen, G., Zhang, P. (2013). Algorithm of Webpage Update Detection Based on Body Text. In: Du, W. (eds) Informatics and Management Science III. Lecture Notes in Electrical Engineering, vol 206. Springer, London. https://doi.org/10.1007/978-1-4471-4790-9_44
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4790-9_44
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4789-3
Online ISBN: 978-1-4471-4790-9
eBook Packages: EngineeringEngineering (R0)