Skip to main content

Answering Cross-Source Keyword Queries over Deep Web Data Sources

  • Conference paper
  • 1157 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 168))

Abstract

A popular trend in data dissemination involves online data sources that are hidden behind query forms, which are part of the deep web. Extracting information across multiple deep web sources in a domain is challenging, but increasingly crucial in many areas. Keyword search, a popular information discovery method, has been studied extensively on the surface web and relational databases. Keyword-based queries can provide a powerful yet intuitive means for accessing data from the deep web as well. However, this involves many challenges. For example, deep web data is hidden behind query interfaces, deep web data sources often contain redundant and/or incomplete data, and there is often inter-dependence among data sources. Thus, it is very hard to automatically execute cross-source queries.

This paper focuses on answering cross-source queries over deep web data sources. In our approach, we model a list of deep web data sources using a graph to capture the dependencies among them, and we consider the problem of answering cross-source queries over these deep web data sources as a graph search problem. We have developed a bidirectional query planning algorithm to generate query plans for two types of cross-source queries, which are entity-attributes queries and entity-entity relationship queries.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, S., Chaudhuri, S., Das, G.: Dbxplore: A system for keyword-based search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering, p. 5 (2002)

    Google Scholar 

  2. Arens, Y., Knoblock, C.A., Shen, W.-M.: Query Reformulation for Dynamic Information Integration. Journal of Intelligent Information Systems - Special Issue on Intelligent Information Integration 6(2/3), 99–130 (1996)

    Google Scholar 

  3. Aditya, B., Bhalotia, G., Chakrabarti, S., Hulgeri, A., Nakhe, C., Parag, P., Sudarshan, S.: Banks: Browsing and keyword searching in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, vol. 28, pp. 1083–1086 (2002)

    Google Scholar 

  4. Bergman, M.K.: The deep web: Surfacing hidden value. Journal of Electronic Publishing 7 (2001)

    Google Scholar 

  5. Bleiholder, J., Khuller, S., Naumann, F., Raschid, L., Wu, Y.: Query planning in the presence of overlapping sources. In: Proceedings of the 10th International Conference on Extending Database Technology, pp. 811–828 (2006)

    Google Scholar 

  6. Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. In: Proceedings of the VLDB Endowment, vol. 1, pp. 562–673 (2008)

    Google Scholar 

  7. Cali, A., Martinenghi, D.: Querying data under access limitations. In: Proceedings of the 24th International Conference on Data Engineering, pp. 50–59 (2008)

    Google Scholar 

  8. Davulcu, H., Freire, J., Kifer, M., Ramakrishnan, I.V.: A layered architecture for query dynamic web content. In: Proceedings of the 1999 SIGMOD Conference, pp. 491–502 (1999)

    Google Scholar 

  9. Doan, A., Halevy, A.: Efficiently ordering query plans for data integration. In: Proceedings of the 18th International Conference on Data Engineering, p. 393 (2002)

    Google Scholar 

  10. Florescu, D., Levy, A., Manolescu, I.: Query optimization in the presence of limited access patterns. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of Data, pp. 311–322 (1999)

    Google Scholar 

  11. Garcia-molina, H., Papakonstantinou, Y., Quass, D., Sagiv, Y., Ullman, J.D., Vassalos, V., Widom, J.: The TSIMMIS Approach to Mediation: Data Models and Languages. Journal of Intelligent Information Systems 8, 117–132 (1997)

    Article  Google Scholar 

  12. Gruser, J.-R., Raschid, L., Zadorozhny, V., Zhan, T.: Learning response time for websources using query feedback and application in query optimization. VLDB Journal 9, 18–37 (2000)

    Article  Google Scholar 

  13. He, B., Zhang, Z., Chang, K.C.-C.: Knocking the door to the deep web: Integrating web query interfaces. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 913–914 (2004)

    Google Scholar 

  14. He, H., Meng, W., Yu, C., Wu, Z.: Automatic integration of web search interfaces with wise_integrator. The International Journal on Very Large Data Bases 12, 256–273 (2004)

    Google Scholar 

  15. He, H., Wang, H., Yang, J., Yu, P.S.: Blinks: Ranked keyword searches on graphs. In: Proceedings of the 2007 ACM SIGMOD International Conference, pp. 305–316 (2007)

    Google Scholar 

  16. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient ir-style keyword search over reltional databases. In: Proceedings of the 29th International Conference on Very Large Data Bases (2003)

    Google Scholar 

  17. Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 67–681 (2002)

    Google Scholar 

  18. Ives, Z.G., Florescu, D., Friedman, M., Levy, A.: An adaptive query execution system for data integration. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 299–310 (1999)

    Google Scholar 

  19. Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 505–516 (2005)

    Google Scholar 

  20. Kasneci, G., Suchanek, F.M., Ifrim, G., Elbassuoni, S., Ramanath, M., Weikum, G.: Naga: Harvesting, searching and ranking knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference, pp. 1285–1288 (2008)

    Google Scholar 

  21. Kementsietsidis, A., Neven, F., Craen, D.V.d., Vansummeren, S.: Scalable multi-query optimization for exploratory queries over federated scientific databases. Proceedings of the VLDB Endowment 1, 16–27 (2008)

    Article  Google Scholar 

  22. Kirk, T., Levy, A.Y., Sagiv, Y., Srivastava, D.: The Information Manifold. In: Proceedings of the AAAI 1995 Spring Symp. on Information Gathering from Heterogeneous, Distributed Enviroments, pp. 85–91 (1995)

    Google Scholar 

  23. Lacroix, Z., Raschid, L., Vidal, M.-E.: Efficient techniques to explore and rank paths in life science data sources. In: Proceedings of the 1st International Workshop on Data Integration in the Life Sciences, pp. 187–202 (2004)

    Google Scholar 

  24. Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: Ease: An effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In: Proceedings of the 2008 ACM SIGMOD International Conference, pp. 903–914 (2008)

    Google Scholar 

  25. Liu, F., Yu, C., Meng, W., Chowdhury, A.: Effective keyword search in retional databases. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of data, pp. 563–574 (2006)

    Google Scholar 

  26. Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s Deep Web Crawl. VLDB Endowment 1, 1241–1252 (2008)

    Article  Google Scholar 

  27. Papakonstantinou, Y., Garcia-molina, H., Ullman, J.: Medmaker: A mediation system based on declarative specifications. In: Internation Conference on Data Engineering, pp. 132–141 (1996)

    Google Scholar 

  28. Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 355–366 (2006)

    Google Scholar 

  29. Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. Proceedings of the VLDB Endowment 1, 785–796 (2008)

    Article  Google Scholar 

  30. Tata, S., Lohman, G.M.: Soak: Doing more with keywords. In: Proceedings of the 2008 ACM SIGMOD, pp. 889–901 (2008)

    Google Scholar 

  31. Tran, T., Wang, H., Haase, P.: Searchwebdb: Data web search on a pay-as-you-go integration infrastructure. In: WWW (2009)

    Google Scholar 

  32. Varadarajan, R., Hristidis, V., Raschid, L.: Explaining and reformulating authority flow queries. In: Proceedings of the 2008 IEEE ICDE International Conference, pp. 883–892 (2008)

    Google Scholar 

  33. Wang, F., Agrawal, G., Jin, R.: Query planning for searching inter-dependent deep-web databases. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 24–41. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  34. Wang, F., Agrawal, G., Jin, R.: A system for relational keyword searches over deep web data sources. Technical Report OSU-CISRC-03/08-TR10, The Ohio State University (March 2008)

    Google Scholar 

  35. Wang, F., Agrawal, G., Jin, R., Piontkivska, H.: Snpminer: A domain-specific deep web mining tool. In: Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, pp. 192–199 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, F., Agrawal, G. (2011). Answering Cross-Source Keyword Queries over Deep Web Data Sources. In: Aluru, S., et al. Contemporary Computing. IC3 2011. Communications in Computer and Information Science, vol 168. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22606-9_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22606-9_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22605-2

  • Online ISBN: 978-3-642-22606-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics