Abstract
Hidden Web’s broad and relevant coverage of dynamic and high quality contents coupled with the high change frequency of web pages poses a challenge for maintaining and fetching up-to-date information. For the purpose, it is required to verify whether a web page has been changed or not, which is another challenge. Therefore, a mechanism needs to be introduced for adjusting the time period between two successive revisits based on probability of updation of the web page. In this paper, architecture is being proposed that introduces a technique to continuously update/refresh the Hidden Web repository.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arasu, A., Cho, J., Garcia-Molina, H., Paepcke, A., Raghavan, S.: Searching the Web. ACM Transactions on Internet Technology (TOIT) 1(1), 2–43 (2001)
Bergman, M.K.: The deep web: Surfacing hidden value. Journal of Electronic Publishing 7(1) (2001)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1999)
Cho, J., Garcia-Molina, H.: The Evolution of the Web and Implications for an Incremental Crawler. In: Proceedings of the Twenty-Sixth VLDB Conference, pp. 200–209 (2000)
Cho, J., Garcia-Molina, H.: Estimating Frequency of Change. Technical report, DB Group, Stanford University (2001)
Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. In: 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–184 (2001)
Bhatia, K.K., Sharma, A.K.: A Framework for an Extensible Domain-specific Hidden Web Crawler (DSHWC). Communicated to IEEE TKDE Journal (December 2008)
Bhatia, K.K., Sharma, A.K.: A Framework for Domain-Specific Interface Mapper (DSIM). International Journal of Computer Science and Network Security (2008)
Bhatia, K.K., Sharma, A.K.: Merging Query Interfaces in Domain-specific Hidden Web Databases. Accepted in International Journal of Computer Science (2008)
Bhatia, K.K., Sharma, A.K.: Crawling the hidden web resources. In: Proceedings of NCIT 2007 (2007)
Dixit, A., Sharma, A.K.: Self Adjusting Refresh Time Based Architecture For Incremental Web Crawler. International Journal of Computer Science and Network Security (IJCSNS)Â 8(12) (2008)
Burner, M.: Crawling towards Eternity: Building an archive of the World Wide Web. Web Techniques Magazine 2(5) (1997)
Cho, J., Garcia-Molina, H.: The evolution of the web and implications for an incremental crawler. In: Proceedings of the 26th International Conference on Very Large Databases
Sharma, A.K., Gupta, J.P., Agarwal, D.P.: A novel approach towards management of Volatile Information. Journal of CSI 33(1), 18–27 (2003)
Cho, J., Garcia-Molina, H.: Estimating Frequency of Change. Technical report, DB Group, Stanford University (2001)
Brewington, B.E., Cybenko, G.: How dynamic is the web. In: Proceedings of the Ninth International World-Wide Web Conference, Amsterdam, Netherlands (2000)
Edwards, J., McCurley, K., Tomlin, J.: An Adaptive Model for Optimizing Performance of an Incremental Web Crawler
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Madaan, R., Dixit, A., Sharma, A.K., Bhatia, K.K. (2010). A Framework for Incremental Domain-Specific Hidden Web Crawler. In: Ranka, S., et al. Contemporary Computing. IC3 2010. Communications in Computer and Information Science, vol 94. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14834-7_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-14834-7_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14833-0
Online ISBN: 978-3-642-14834-7
eBook Packages: Computer ScienceComputer Science (R0)