Abstract
Change in content of web documents is a constant process and this rate of change is different for different pages. This change must be updated at the search engine database else a user gets a superseded image of the web documents. Many methods for change detection have been developed that use tree based comparisons to decide whether two versions of a web document are same or not. But these methods are prone to high complexity and ambiguity. Also frequent crawler revisits results in increased pressure on Internet traffic and bandwidth usage. In this paper network efficient web page change detection technique for migrating crawlers is being proposed that effectively detects structural and content changes by comparing proposed tag and text code for each of the html tags contained in the web page. The proposed method performs well and is able to detect changes even at minute level while keeping the network load low.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chakravarthy, Hari H.S.C.: Automating change detection and notification of web pages. Proce. 17th Int. Conf. Database Expert Syst. Appl. (DEXA’06), IEEE, 0-7695-2641-1/06, (2006)
Yadav D., Sharma A.K, Gupta J.P: Topical web crawling using weighted anchor text and web page change detection techniques. 10th Int. Conf. Inform. Technol. IEEE, 0-7695-3068-0/07, 265–270, (2007)
Sharma AK, Dixit A.: Self-adjusting refresh time based architecture for incremental web crawler. Int J Comput Sci Network Secur(IJCSNS), 8(12), 349–54, (2008)
Gupta Ashlesha, Dixit Ashutosh.: Issues and Challenges in Effective Design of Search Engine. Int. J. Multi. Res. Studies, Dec (2012)
Artail H. and Abi-Aad M: An enhanced web page change detection approach based on limiting similarity computations to elements of same type, Springer Science + Business Media. LLC. pp. 1–21 (2007)
Yadav D., Sharma A.K.,Gupta J.P.: Parallel crawler architecture and web page change detection. WSEAS Trans. Comput. pp 929–940, (July 2008)
Goel S., Aggarwal R. R.: An efficient algorithm for web page change detection. Int. J. Comput. Appl. (0975—888), 48(10), 28–33, June (2012)
Wang Y., DeWitt D, Cai,J.: X-Diff: An Effective Change Detection Algorithm for XML Documents. Proc. 19th Int. Conf. Data Eng. pp. 519–30, (2003)
L. Su-bin, W.C. Shi, Z.H Liang, X.M.Yu, L. Zhang.: A direct web page templates detection method. IEEE Int. Conf. 978–1-4244-7255-0/11, (2011)
P. Ying, D. Xuhua.: Anomaly based web phishing page detection. Proc. 22nd Annu. Comput. Secur. Appl. Conf. IEEE, 0-7695-2716-7/06, (2006)
G. Ashlesha, Dixit, A., Sharma A.K.: Relevant document crawling with usage pattern and domain profile based page ranking. ISCON, 2013, IEEE International Conference held at GLA University, Mathura, (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gupta, A., Dixit, A., Sharma, A.K. (2018). A Novel Web Page Change Detection Technique for Migrating Crawlers. In: Urooj, S., Virmani, J. (eds) Sensors and Image Processing. Advances in Intelligent Systems and Computing, vol 651. Springer, Singapore. https://doi.org/10.1007/978-981-10-6614-6_5
Download citation
DOI: https://doi.org/10.1007/978-981-10-6614-6_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6613-9
Online ISBN: 978-981-10-6614-6
eBook Packages: EngineeringEngineering (R0)