Skip to main content

Algorithm of Webpage Update Detection Based on Body Text

  • Conference paper
  • First Online:
Informatics and Management Science III

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 206))

  • 694 Accesses

Abstract

In the process of Internet information recycles, especially in the application of resource download, we need to judge whether a webpage is updated or not. So we can decide the resource that whether it needs to be downloaded or not. In this paper we put forward an algorithm about the webpage update detection which is based on the webpage’s body text. This algorithm is based on extracting Chinese text feature and judges whether a webpage need to be updated or not by analyzing the feature. The result shows that this method has high detection rate and quick progressing speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Elhadi M, Al-Tobi A (2009) Webpage duplicate detection using combined POS and sequence alignment algorithm. In: 2009 WRI world congress on computer science and information engineering, vol 76, pp 630–634

    Google Scholar 

  2. Liu KY, Zheng JH (2002) Research of automatic Chinese word segmentation. Proc Int Conf Mach Learn Cybern 55(2):805–809

    Google Scholar 

  3. Abudoulikemu Y (2010) The research and application of the Chinese machinery word segmentation algorithm based on improved patricia tree dictionary. In: 2nd international conference on signal processing systems (ICSPS), 2010, vol 54, pp 341–345

    Google Scholar 

  4. Wang FL, Yang CC (2007) Mining web data for Chinese segmentation. J Am Soc Inform Sci Technol 58(12):1820–1837

    Article  Google Scholar 

  5. Ma WY (2007) Effective analysis of Chinese word-segmentation accuracy. Mod Electron Technol 4(243):108–111

    Google Scholar 

Download references

Acknowledgments

Thanks for sponsors of, 2009BAH40B04, CNGI-09-03-15 and NCET-09-0708.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guowei Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this paper

Cite this paper

Chen, G., Zhang, P. (2013). Algorithm of Webpage Update Detection Based on Body Text. In: Du, W. (eds) Informatics and Management Science III. Lecture Notes in Electrical Engineering, vol 206. Springer, London. https://doi.org/10.1007/978-1-4471-4790-9_44

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4790-9_44

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4789-3

  • Online ISBN: 978-1-4471-4790-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics