Abstract
As Web applications grow in terms of quantity and quality, different vertical solutions could make use of them as an important source of information. Nevertheless, obtaining information from web sources becomes a challenging issue because of their complex access due to the hypertext browsing paradigm, and HTML’s semistructured format. Web Automation middleware navigates through web links and fills web forms in an automatic way, so to extract information from the Hidden Web. The main optimization parameter is the time required to navigate through the intermediate pages that lead to the desired results. This work proposes a technique which focuses on improving the browsing time by storing information from previous queries, and using it to preload an adequate subset of the navigational sequence on a specific browser, before the next sequence is launched. It also takes into account the most commonly used sequences, being the ones to be preloaded more often.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: Proceedings of the ACM SIGMOD international conference on Management of data (2003)
Bergman, M.K.: The Deep Web. Surfacing Hidden Value, http://www.brightplanet.com/technology/deepweb.asp
Garret, J.J.: Ajax: A New Approach to Web Applications, http://www.adaptivepath.com/publications/essays/archives/000385print.php
Hidalgo, J., Pan, A., Losada, J., Álvarez, M.: Adding Physical Optimization to Cost Models in Information Mediators. In: 2005 IEEE Conference on e-Business Engineering (2005)
Hidalgo, J., Pan, A., Losada, J., Álvarez, M., Viña, A.: Building the Architecture of a Statistics-based Query Optimization Solution for Heterogeneous Mediators. In: 6th International Conference on Information Integration and Web-based Applications & Services (2004)
Knoblock, C.A., Lerman, K., Minton, S., Muslea, I.: Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach. Bulletin of the IEEE Computer Society Technical Committee on Data Enginnering (1999)
Kushmerick, N., Weld, D.S., Doorembos, R.: Wrapper induction for information extraction. In: Proceedings of the fifteenth International Joint Conference on Artificial Intelligence (1997)
Laender, A.H.F., Ribeiro-Neto, B.A., Soares da Silva, A., Teixeira, J.S.: A Brief Survey of Web Data Extraction Tools. ACM SIGMOD Record 31(2) (2002)
Pan, A., et al.: Semi-Automatic Wrapper Generation for Commercial Web Sources. In: Proceedings of IFIP WG8.1 Working Conference on Engineering Information Systems in the Internet Context (2002)
Pan, A., Raposo, J., Álvarez, M., Montoto, P., Orjales, V., Hidalgo, J., Ardao, L., Molano, A., Viña, A.: The DENODO Data Integration Platform. In: 28th International Conference on Very Large Databases (2002)
Raghavan, S., García-Molina, H.: Crawling the Hidden Web. In: Proceedings of the 27th International Conference on Very Large Databases (2001)
Raposo, J., Pan, A., Álvarez, M., Hidalgo, J.: Automatically Generating Labeled Examples for Web Wrapper Maintenance. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence (2005)
Raposo, J., Pan, A., Álvarez, M., Viña, A.: Automatic Wrapper Maintenance for Semi-Structured Web Sources Using Results from Previous Queries. In: Proceedings of the 2005 ACM Symposium on Applied Computing (2005)
Wiederhold, G.: Mediators in the Architecture of Future Information Systems. IEEE Computer Society Press, Los Alamitos (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hidalgo, J., Pan, A., Losada, J., Álvarez, M. (2006). Preloading Browsers for Optimizing Automatic Access to Hidden Web: A Ranking-Based Repository Solution. In: Manolopoulos, Y., Pokorný, J., Sellis, T.K. (eds) Advances in Databases and Information Systems. ADBIS 2006. Lecture Notes in Computer Science, vol 4152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827252_15
Download citation
DOI: https://doi.org/10.1007/11827252_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37899-0
Online ISBN: 978-3-540-37900-3
eBook Packages: Computer ScienceComputer Science (R0)