Skip to main content

Querying Capability Modeling and Construction of Deep Web Sources

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4831))

Abstract

Information in a deep Web source can be accessed through queries submitted on its query interface. Many Web applications need to interact with the query interfaces of deep Web sources such as deep Web crawling and comparison-shopping. Analyzing the querying capability of a query interface is critical in supporting such interactions automatically and effectively. In this paper, we propose a querying capability model based on the concept of atomic query which is a valid query with a minimal attribute set. We also provide an approach to construct the querying capability model automatically by identifying atomic queries for any given query interface. Our experimental results show that the accuracy of our algorithm is good.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bergman, M.: The Deep Web: Surfacing Hidden Value (September 2001), http://www.BrightPlanet.com

  2. Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: VLDB, pp. 129–138 (2001)

    Google Scholar 

  3. Chang, K.C-C, He, B., Li, C., Patel, M., Zhang, Z.: Structured Databases on the Web: Observations and Implications. SIGMOD Record 33(3), 61–70 (2004)

    Article  Google Scholar 

  4. Zhang, Z., He, B., Chang, K.C-C: Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. In: SIGMOD Conference, pp. 107–118 (2004)

    Google Scholar 

  5. He, H., Meng, W., Yu, C., Wu, Z.: Constructing Interface Schemas for Search Interfaces of Web Databases. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 29–42. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  6. He, H., Meng, W., Yu, C., Wu, Z.: Automatic integration of Web search interfaces with WISE-Integrator. VLDB J. 13(3), 256–273 (2004)

    Article  Google Scholar 

  7. Wu, P., Wen, J., Liu, H., Ma, W.: Query Selection Techniques for Efficient Crawling of Structured Web Sources. In: ICDE (2006)

    Google Scholar 

  8. Levy, A., Rajaraman, A., Ordille, J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: VLDB, pp. 251–262 (1996)

    Google Scholar 

  9. Ipeirotis, P., Agichtein, E., Jain, P., Gravano, L.: To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks. In: SIGMOD Conference (2006)

    Google Scholar 

  10. BrightPlanet.com, http://www.brightplanet.com

  11. Bergholz, A., Chidlovskii, B.: Crawling for Domain-Specific Hidden Web Resources. In: WISE 2003, pp. 125–133 (2003)

    Google Scholar 

  12. Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: SIGMOD Conference, pp. 337–348 (2003)

    Google Scholar 

  13. Chang, K.C.-C., He, B., Li, C., Zhang, Z.: The UIUC web integration repository. CS Dept., Uni. of Illinois at Urbana-Champaign (2003), http://metaquerier.cs.uiuc.edu/repository

  14. Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Boualem Benatallah Fabio Casati Dimitrios Georgakopoulos Claudio Bartolini Wasim Sadiq Claude Godart

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shu, L., Meng, W., He, H., Yu, C. (2007). Querying Capability Modeling and Construction of Deep Web Sources. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76993-4_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76992-7

  • Online ISBN: 978-3-540-76993-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics