Abstract
In Deep Web data integration, the metaquerier provides a unified interface for each domain, which can dispatch the user query to the most relevant Web databases. Traditional database selection algorithms are often based on content summaries. However, many web-accessible databases are uncooperative. The only way of accessing the contents of these databases is via querying. In this paper, we propose an approximate content summary approach for database selection. Furthermore, the real-life databases are not always static and, accordingly, the statistical content summary needs to be updated periodically to reflect database content changes. Therefore, we also propose a survival function approach to give appropriate schedule to regenerate approximate content summary. We conduct extensive experiments to illustrate the accuracy and efficiency of our techniques.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
The Deep Web: Surfacing Hidden Value, http://www.completeplanet.com/Tutorials/DeepWeb/
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons, Chichester (2001)
Jiang, F., Meng, W., Meng, X.: Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds.) DASFAA 2009. LNCS, vol. 5667, pp. 595–600. Springer, Heidelberg (2009)
Meng, W., Liu, K., Yu, C., Wang, X., Chang, Y.: Determining Text Databases to Search in the Internet. In: VLDB 1998, New York, pp.14–25 (1998)
Wu, W., Yu, C., Meng, W.: Database Selection for Longer Queries. In: The 2004 Meeting of the International Federation of Classification Societies, Chicago, pp. 575–584 (2004)
Callan, J.P., Connell, M.E.: Query-based sampling of text databases. J. ACM Transactions on Information Systems (TOIS) 19(2), 97–130 (2001)
Ipeirotis, P., Gravano, L.: Classification-Aware Hidden-Web Text Database Selection. J. ACM Transactions on Information Systems (TOIS) article 6 26(2) (2008)
Nie, Z., Kambhampati, S.: A Frequency-based Approach for Mining Coverage Statistics in Data Integration. In: ICDE 2004, Boston, pp. 387–398 (2004)
Dasgupta, A., Das, G., Mannila, H.: A random walk approach to sampling hidden databases. In: SIGMOD 2007, Beijing, pp. 629–640 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, F., Li, Y., Zhao, J., Yang, N. (2010). Approximate Content Summary for Database Selection in Deep Web Data Integration. In: Shen, H.T., et al. Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16720-1_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-16720-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16719-5
Online ISBN: 978-3-642-16720-1
eBook Packages: Computer ScienceComputer Science (R0)