Abstract
A popular trend in data dissemination involves online data sources that are hidden behind query forms, which are part of the deep web. Extracting information across multiple deep web sources in a domain is challenging, but increasingly crucial in many areas. Keyword search, a popular information discovery method, has been studied extensively on the surface web and relational databases. Keyword-based queries can provide a powerful yet intuitive means for accessing data from the deep web as well. However, this involves many challenges. For example, deep web data is hidden behind query interfaces, deep web data sources often contain redundant and/or incomplete data, and there is often inter-dependence among data sources. Thus, it is very hard to automatically execute cross-source queries.
This paper focuses on answering cross-source queries over deep web data sources. In our approach, we model a list of deep web data sources using a graph to capture the dependencies among them, and we consider the problem of answering cross-source queries over these deep web data sources as a graph search problem. We have developed a bidirectional query planning algorithm to generate query plans for two types of cross-source queries, which are entity-attributes queries and entity-entity relationship queries.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, S., Chaudhuri, S., Das, G.: Dbxplore: A system for keyword-based search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering, p. 5 (2002)
Arens, Y., Knoblock, C.A., Shen, W.-M.: Query Reformulation for Dynamic Information Integration. Journal of Intelligent Information Systems - Special Issue on Intelligent Information Integration 6(2/3), 99–130 (1996)
Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, P., Sudarshan, S.: Banks: Browsing and keyword searching in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, vol. 28, pp. 1083–1086 (2002)
Bergman, M.K.: The deep web: Surfacing hidden value. Journal of Electronic Publishing 7 (2001)
Bleiholder, J., Khuller, S., Naumann, F., Raschid, L., Wu, Y.: Query planning in the presence of overlapping sources. In: Proceedings of the 10th International Conference on Extending Database Technology, pp. 811–828 (2006)
Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. In: Proceedings of the VLDB Endowment, vol. 1, pp. 562–673 (2008)
Cali, A., Martinenghi, D.: Querying data under access limitations. In: Proceedings of the 24th International Conference on Data Engineering, pp. 50–59 (2008)
Davulcu, H., Freire, J., Kifer, M., Ramakrishnan, I.V.: A layered architecture for query dynamic web content. In: Proceedings of the 1999 SIGMOD Conference, pp. 491–502 (1999)
Doan, A., Halevy, A.: Efficiently ordering query plans for data integration. In: Proceedings of the 18th International Conference on Data Engineering, p. 393 (2002)
Florescu, D., Levy, A., Manolescu, I.: Query optimization in the presence of limited access patterns. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of Data, pp. 311–322 (1999)
Garcia-molina, H., Papakonstantinou, Y., Quass, D., Sagiv, Y., Ullman, J.D., Vassalos, V., Widom, J.: The TSIMMIS Approach to Mediation: Data Models and Languages. Journal of Intelligent Information Systems 8, 117–132 (1997)
Gruser, J.-R., Raschid, L., Zadorozhny, V., Zhan, T.: Learning response time for websources using query feedback and application in query optimization. VLDB Journal 9, 18–37 (2000)
He, B., Zhang, Z., Chang, K.C.-C.: Knocking the door to the deep web: Integrating web query interfaces. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 913–914 (2004)
He, H., Meng, W., Yu, C., Wu, Z.: Automatic integration of web search interfaces with wise_integrator. The International Journal on Very Large Data Bases 12, 256–273 (2004)
He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: Ranked keyword searches on graphs. In: Proceedings of the 2007 ACM SIGMOD International Conference, pp. 305–316 (2007)
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over reltional databases. In: Proceedings of the 29th International Conference on Very Large Data Bases (2003)
Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 67–681 (2002)
Ives, Z.G., Florescu, D., Friedman, M., Levy, A.: An adaptive query execution system for data integration. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 299–310 (1999)
Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 505–516 (2005)
Kasneci, G., Suchanek, F.M., Ifrim, G., Elbassuoni, S., Ramanath, M., Weikum, G.: Naga: Harvesting, searching and ranking knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference, pp. 1285–1288 (2008)
Kementsietsidis, A., Neven, F., Craen, D.V.d., Vansummeren, S.: Scalable multi-query optimization for exploratory queries over federated scientific databases. Proceedings of the VLDB Endowment 1, 16–27 (2008)
Kirk, T., Levy, A.Y., Sagiv, Y., Srivastava, D.: The Information Manifold. In: Proceedings of the AAAI 1995 Spring Symp. on Information Gathering from Heterogeneous, Distributed Enviroments, pp. 85–91 (1995)
Lacroix, Z., Raschid, L., Vidal, M.-E.: Efficient techniques to explore and rank paths in life science data sources. In: Proceedings of the 1st International Workshop on Data Integration in the Life Sciences, pp. 187–202 (2004)
Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: An effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the 2008 ACM SIGMOD International Conference, pp. 903–914 (2008)
Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in retional databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of data, pp. 563–574 (2006)
Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s Deep Web Crawl. VLDB Endowment 1, 1241–1252 (2008)
Papakonstantinou, Y., Garcia-molina, H., Ullman, J.: Medmaker: A mediation system based on declarative specifications. In: Internation Conference on Data Engineering, pp. 132–141 (1996)
Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 355–366 (2006)
Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. Proceedings of the VLDB Endowment 1, 785–796 (2008)
Tata, S., Lohman, G.M.: Soak: Doing more with keywords. In: Proceedings of the 2008 ACM SIGMOD, pp. 889–901 (2008)
Tran, T., Wang, H., Haase, P.: Searchwebdb: Data web search on a pay-as-you-go integration infrastructure. In: WWW (2009)
Varadarajan, R., Hristidis, V., Raschid, L.: Explaining and reformulating authority flow queries. In: Proceedings of the 2008 IEEE ICDE International Conference, pp. 883–892 (2008)
Wang, F., Agrawal, G., Jin, R.: Query planning for searching inter-dependent deep-web databases. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 24–41. Springer, Heidelberg (2008)
Wang, F., Agrawal, G., Jin, R.: A system for relational keyword searches over deep web data sources. Technical Report OSU-CISRC-03/08-TR10, The Ohio State University (March 2008)
Wang, F., Agrawal, G., Jin, R., Piontkivska, H.: Snpminer: A domain-specific deep web mining tool. In: Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, pp. 192–199 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, F., Agrawal, G. (2011). Answering Cross-Source Keyword Queries over Deep Web Data Sources. In: Aluru, S., et al. Contemporary Computing. IC3 2011. Communications in Computer and Information Science, vol 168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22606-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-22606-9_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22605-2
Online ISBN: 978-3-642-22606-9
eBook Packages: Computer ScienceComputer Science (R0)