Skip to main content

K-Graphs: Selecting Top-k Data Sources for XML Keyword Queries

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6860))

Abstract

Most of existing approaches on XML keyword search focus on querying over a single data source. However, searching over hundreds or even thousands of (distributed) data sources by sequentially querying every single data source is extremely costly, thus it can be impractical. In this paper, we propose an approach for selecting top-k data sources to a given query in order to avoid the high cost of searching numerous, potentially irrelevant data sources. The proposed approach can efficiently select top-k mostly relevant data sources without querying over the data sources. We propose a ranking function for measuring the strength of correlation between keywords in a data source and summarize the data sources as keywords correlation graphs (K-Graphs). The top-k relevant data sources will be selected by estimating the relevance of corresponding K-Graphs to the query. Experimental results show that the approach achieves good performance with a variety of experimental parameters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in xml. In: Proceedings of CIKM, pp. 389–396. ACM, New York (2005)

    Google Scholar 

  2. Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: a semantic search engine for xml. In: Proceedings of VLDB Endowment, pp. 45–56 (2003)

    Google Scholar 

  3. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over xml documents. In: Proceedings of SIGMOD, pp. 16–27. ACM, New York (2003)

    Google Scholar 

  4. Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. TKDE, 525–539 (2006)

    Google Scholar 

  5. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: Proceedings of CIKM, pp. 31–40. ACM, New York (2007)

    Google Scholar 

  6. Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: Proceedings of SIGMOD, pp. 329–340. ACM, New York (2007)

    Google Scholar 

  7. Liu, Z., Walker, J., Chen, Y.: Xseek: a semantic xml search engine using keywords. In: Proceedings of VLDB Endowment, pp. 1330–1333 (2007)

    Google Scholar 

  8. Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual xml views. In: Proceedings of VLDB Endowment, pp. 1057–1068 (2007)

    Google Scholar 

  9. Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: Proceedings of SIGMOD, pp. 527–538. ACM, New York (2005)

    Google Scholar 

  10. Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: Proceedings of WWW, pp. 1043–1052. ACM, New York (2007)

    Google Scholar 

  11. Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: Proceedings of EDBT, pp. 535–546. ACM, New York (2008)

    Google Scholar 

  12. Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: Proceedings of EDBT, pp. 549–560. ACM, New York (2010)

    Google Scholar 

  13. Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: Proceeding of ICDE, pp. 267–276. IEEE Computer Society, Washington, DC, USA (2008)

    Google Scholar 

  14. http://dblp.unitrier.de/xml/

  15. http://www.oracle.com/technology/products/berkeleydb/index.html

  16. Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: Proceeding of ICDE, pp. 689–700 (2010)

    Google Scholar 

  17. Powell, A.L., French, J.C.: Comparing the performance of collection selection algorithms. ACM Trans. Inf. Syst. 21, 412–456 (2003)

    Article  Google Scholar 

  18. Gravano, L., García-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)

    Article  Google Scholar 

  19. Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: Proceedings of CIKM, pp. 31–40. ACM, New York (2007)

    Google Scholar 

  20. Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms 57, 75–94 (2005)

    Article  MATH  Google Scholar 

  21. Li, Y., Yu, C., Jagadish, H.V.: Enabling schema-free xquery with meaningful query focus. The VLDB Journal 17(3), 355–377 (2008)

    Article  Google Scholar 

  22. Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: Proceedings of ICDE, Washington, DC, USA, pp. 517–528 (2009)

    Google Scholar 

  23. Gravano, L., García-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)

    Article  Google Scholar 

  24. Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the internet. In: Proceedings of DASFAA, pp. 41–50. World Scientific Press, Singapore (1997)

    Google Scholar 

  25. Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of SIGIR, pp. 21–28. ACM, New York (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, K., Cao, J. (2011). K-Graphs: Selecting Top-k Data Sources for XML Keyword Queries. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6860. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23088-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23088-2_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23087-5

  • Online ISBN: 978-3-642-23088-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics