Approximate Content Summary for Database Selection in Deep Web Data Integration

Jiang, Fangjiao; Li, Yukun; Zhao, Jiping; Yang, Nan

doi:10.1007/978-3-642-16720-1_22

Approximate Content Summary for Database Selection in Deep Web Data Integration

Fangjiao Jiang²⁵,
Yukun Li²⁶,
Jiping Zhao²⁵ &
…
Nan Yang²⁶

Conference paper

1385 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6185))

Abstract

In Deep Web data integration, the metaquerier provides a unified interface for each domain, which can dispatch the user query to the most relevant Web databases. Traditional database selection algorithms are often based on content summaries. However, many web-accessible databases are uncooperative. The only way of accessing the contents of these databases is via querying. In this paper, we propose an approximate content summary approach for database selection. Furthermore, the real-life databases are not always static and, accordingly, the statistical content summary needs to be updated periodically to reflect database content changes. Therefore, we also propose a survival function approach to give appropriate schedule to regenerate approximate content summary. We conduct extensive experiments to illustrate the accuracy and efficiency of our techniques.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The Deep Web: Surfacing Hidden Value, http://www.completeplanet.com/Tutorials/DeepWeb/
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley and Sons, Chichester (2001)
MATH Google Scholar
Jiang, F., Meng, W., Meng, X.: Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration. In: Chen, L., Liu, C., Liu, Q., Deng, K. (eds.) DASFAA 2009. LNCS, vol. 5667, pp. 595–600. Springer, Heidelberg (2009)
Google Scholar
Meng, W., Liu, K., Yu, C., Wang, X., Chang, Y.: Determining Text Databases to Search in the Internet. In: VLDB 1998, New York, pp.14–25 (1998)
Google Scholar
Wu, W., Yu, C., Meng, W.: Database Selection for Longer Queries. In: The 2004 Meeting of the International Federation of Classification Societies, Chicago, pp. 575–584 (2004)
Google Scholar
Callan, J.P., Connell, M.E.: Query-based sampling of text databases. J. ACM Transactions on Information Systems (TOIS) 19(2), 97–130 (2001)
Article Google Scholar
Ipeirotis, P., Gravano, L.: Classification-Aware Hidden-Web Text Database Selection. J. ACM Transactions on Information Systems (TOIS) article 6 26(2) (2008)
Google Scholar
Nie, Z., Kambhampati, S.: A Frequency-based Approach for Mining Coverage Statistics in Data Integration. In: ICDE 2004, Boston, pp. 387–398 (2004)
Google Scholar
Dasgupta, A., Das, G., Mannila, H.: A random walk approach to sampling hidden databases. In: SIGMOD 2007, Beijing, pp. 629–640 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Intelligent Information Processing, Xuzhou Normal University, Jiangsu, China
Fangjiao Jiang & Jiping Zhao
School of Information, Renmin University of China, China
Yukun Li & Nan Yang

Authors

Fangjiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yukun Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiping Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Nan Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia
Heng Tao Shen
School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Jian Pei
David R. Cheriton School of Computer Science, University of Waterloo, Canada
M. Tamer Özsu
Peking University, China
Lei Zou
Renmin University of China, China
Jiaheng Lu
National University of Singapore, Singapore
Tok-Wang Ling
Northeastern University, 110004, Shenyang, China
Ge Yu
College of Computer Science, Zhejiang University, 310027, Hangzhou, P.R. China
Yi Zhuang
University of Melbourne, Australia
Jie Shao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, F., Li, Y., Zhao, J., Yang, N. (2010). Approximate Content Summary for Database Selection in Deep Web Data Integration. In: Shen, H.T., et al. Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16720-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-16720-1_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16719-5
Online ISBN: 978-3-642-16720-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics