K-Graphs: Selecting Top-k Data Sources for XML Keyword Queries

Nguyen, Khanh; Cao, Jinli

doi:10.1007/978-3-642-23088-2_31

K-Graphs: Selecting Top-k Data Sources for XML Keyword Queries

Khanh Nguyen²⁰ &
Jinli Cao²⁰

Conference paper

1247 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6860))

Abstract

Most of existing approaches on XML keyword search focus on querying over a single data source. However, searching over hundreds or even thousands of (distributed) data sources by sequentially querying every single data source is extremely costly, thus it can be impractical. In this paper, we propose an approach for selecting top-k data sources to a given query in order to avoid the high cost of searching numerous, potentially irrelevant data sources. The proposed approach can efficiently select top-k mostly relevant data sources without querying over the data sources. We propose a ranking function for measuring the strength of correlation between keywords in a data source and summarize the data sources as keywords correlation graphs (K-Graphs). The top-k relevant data sources will be selected by estimating the relevance of corresponding K-Graphs to the query. Experimental results show that the approach achieves good performance with a variety of experimental parameters.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cohen, S., Kanza, Y., Kimelfeld, B., Sagiv, Y.: Interconnection semantics for keyword search in xml. In: Proceedings of CIKM, pp. 389–396. ACM, New York (2005)
Google Scholar
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: a semantic search engine for xml. In: Proceedings of VLDB Endowment, pp. 45–56 (2003)
Google Scholar
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over xml documents. In: Proceedings of SIGMOD, pp. 16–27. ACM, New York (2003)
Google Scholar
Hristidis, V., Koudas, N., Papakonstantinou, Y., Srivastava, D.: Keyword Proximity Search in XML Trees. TKDE, 525–539 (2006)
Google Scholar
Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: Proceedings of CIKM, pp. 31–40. ACM, New York (2007)
Google Scholar
Liu, Z., Chen, Y.: Identifying meaningful return information for xml keyword search. In: Proceedings of SIGMOD, pp. 329–340. ACM, New York (2007)
Google Scholar
Liu, Z., Walker, J., Chen, Y.: Xseek: a semantic xml search engine using keywords. In: Proceedings of VLDB Endowment, pp. 1330–1333 (2007)
Google Scholar
Shao, F., Guo, L., Botev, C., Bhaskar, A., Chettiar, M., Yang, F., Shanmugasundaram, J.: Efficient keyword search over virtual xml views. In: Proceedings of VLDB Endowment, pp. 1057–1068 (2007)
Google Scholar
Xu, Y., Papakonstantinou, Y.: Efficient keyword search for smallest lcas in xml databases. In: Proceedings of SIGMOD, pp. 527–538. ACM, New York (2005)
Google Scholar
Sun, C., Chan, C.Y., Goenka, A.K.: Multiway slca-based keyword search in xml data. In: Proceedings of WWW, pp. 1043–1052. ACM, New York (2007)
Google Scholar
Xu, Y., Papakonstantinou, Y.: Efficient lca based keyword search in xml data. In: Proceedings of EDBT, pp. 535–546. ACM, New York (2008)
Google Scholar
Zhou, R., Liu, C., Li, J.: Fast elca computation for keyword queries on xml data. In: Proceedings of EDBT, pp. 549–560. ACM, New York (2010)
Google Scholar
Hadjieleftheriou, M., Chandel, A., Koudas, N., Srivastava, D.: Fast indexes and algorithms for set similarity selection queries. In: Proceeding of ICDE, pp. 267–276. IEEE Computer Society, Washington, DC, USA (2008)
Google Scholar
http://dblp.unitrier.de/xml/
http://www.oracle.com/technology/products/berkeleydb/index.html
Chen, L.J., Papakonstantinou, Y.: Supporting top-k keyword search in xml databases. In: Proceeding of ICDE, pp. 689–700 (2010)
Google Scholar
Powell, A.L., French, J.C.: Comparing the performance of collection selection algorithms. ACM Trans. Inf. Syst. 21, 412–456 (2003)
Article Google Scholar
Gravano, L., García-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)
Article Google Scholar
Li, G., Feng, J., Wang, J., Zhou, L.: Effective keyword search for valuable lcas over xml documents. In: Proceedings of CIKM, pp. 31–40. ACM, New York (2007)
Google Scholar
Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. Journal of Algorithms 57, 75–94 (2005)
Article MATH Google Scholar
Li, Y., Yu, C., Jagadish, H.V.: Enabling schema-free xquery with meaningful query focus. The VLDB Journal 17(3), 355–377 (2008)
Article Google Scholar
Bao, Z., Ling, T.W., Chen, B., Lu, J.: Effective xml keyword search with relevance oriented ranking. In: Proceedings of ICDE, Washington, DC, USA, pp. 517–528 (2009)
Google Scholar
Gravano, L., García-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)
Article Google Scholar
Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on the internet. In: Proceedings of DASFAA, pp. 41–50. World Scientific Press, Singapore (1997)
Google Scholar
Callan, J.P., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of SIGIR, pp. 21–28. ACM, New York (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Australia
Khanh Nguyen & Jinli Cao

Authors

Khanh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Jinli Cao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut de Recherche en Informatique de Toulouse (IRIT), Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
Brigham Young University, 784 TNRB, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Center Hagenberg and Johannes-Keppler-University Linz, Softwarepark 21, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
School of Information Technology and Electrical Engineering, University of Queensland, 4072, Brisbane, QLD, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, K., Cao, J. (2011). K-Graphs: Selecting Top-k Data Sources for XML Keyword Queries. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6860. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23088-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-23088-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23087-5
Online ISBN: 978-3-642-23088-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics