Efficient Querying of Correlated Uncertain Data with Cached Results

Chen, Jinchuan; Zhang, Min; Xie, Xike; Du, Xiaoyong

doi:10.1007/978-3-642-37487-6_35

Jinchuan Chen²¹,
Min Zhang²²,
Xike Xie²³ &
…
Xiaoyong Du^21,22

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7825))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1725 Accesses

Abstract

Although there have been many efforts for management of uncertain data, evaluating probabilistic inference queries, a known NP-hard problem, is still a big challenge, especially for querying data with highly correlations. The state-of-art exact algorithms for accelerating the evaluation of inference queries are based on special indices. Besides, with the observation of the existence of many frequent queries, some researchers try to improve efficiency by reusing previously queried results. Indexing depends on the static properties like data distributions, whereas caching is in favor of the dynamic features like query workload. In this paper we propose a new approach for speeding up the evaluation of inference queries by caching frequent results in a junction tree-based hierarchical index. To the best of our knowledge, this is the first effort on utilizing both the static (data) and dynamic (query workload) properties to efficiently evaluate probabilistic inference queries. Moreover, according to our experience, different caching strategies may significantly affect the query performance. Basically a good caching strategy needs to have high cache hit ratio with limited space budget. Based on these considerations, we propose a novel caching approach, called FVEC, and present corresponding algorithms for efficiently querying correlated uncertain data. We further conduct a series of extensive experiments on large uncertain datasets in order to illustrate the effectiveness and efficiency of our proposed approaches. As illustrated by the results, compared with previous solutions, our method could greatly improve the query performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The pareto principle, http://en.wikipedia.org/wiki/Pareto_principle
Chen, R., Mao, Y., Kiringa, I.: Grn model of probabilistic databases: construction, transition and querying. In: SIGMOD 2010, pp. 291–302 (2010)
Google Scholar
Cheng, R., Chen, J., Mokbel, M.F., Chow, C.-Y.: Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over uncertain data. In: ICDE 2008, pp. 973–982 (2008)
Google Scholar
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD 2003, pp. 551–562 (2003)
Google Scholar
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB 2004, pp. 864–875 (2004)
Google Scholar
Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD 2008, pp. 861–874 (2008)
Google Scholar
Hwang, F.K., Richards, D.S.: Steiner tree problems. Networks 22(1), 55–89 (1992)
Article MathSciNet MATH Google Scholar
Kanagal, B., Deshpande, A.: Indexing correlated probabilistic databases. In: SIGMOD 2009, pp. 455–468 (2009)
Google Scholar
Kanagal, B., Deshpande, A.: Lineage processing over correlated probabilistic databases. In: SIGMOD 2010, pp. 675–686 (2010)
Google Scholar
Koller, D., Friedman, N.: Probabilistic Graphical Models Principles and Techniques. MIT Press, London (2009)
Google Scholar
Lian, X., Chen, L.: Efficient query answering in probabilistic rdf graphs. In: SIGMOD 2011, pp. 157–168 (2011)
Google Scholar
Sang, T., Bearne, P., Kautz, H.: Performing bayesian inference by weighted model counting. In: AAAI 2005, pp. 475–481. AAAI Press (2005)
Google Scholar
Sen, P., Deshpande, A., Getoor, L.: Exploiting shared correlations in probabilistic databases. In: VLDB 2008 (2008)
Google Scholar
Singh, S., Mayfield, C., Prabhakar, S., Shah, R., Hambrusch, S.E.: Indexing uncertain categorical data. In: ICDE 2007, pp. 616–625 (2007)
Google Scholar
Song, S., Chen, L., Yu, J.X.: Answering frequent probabilistic inference queries in databases. IEEE Trans. on Knowledge and Data Engineering 23, 512–526 (2011)
Article Google Scholar
Sun, L., Cheng, R., Cheung, D.W., Cheng, J.: Mining uncertain data with probabilistic guarantees. In: KDD 2010, pp. 273–282 (2010)
Google Scholar
Tao, Y., Cheng, R., Xiao, X., Ngai, W.K., Kao, B., Prabhakar, S.: Indexing multi-dimensional uncertain data with arbitrary probability density functions. In: VLDB 2005, pp. 922–933. VLDB Endowment (2005)
Google Scholar
Wang, D.Z., Franklin, M.J., Garofalakis, M., Hellerstein, J.M.: Querying probabilistic information extraction. In: PVLDB, vol. 3(1-2), pp. 1057–1067 (September 2010)
Google Scholar
Zhang, W., Lin, X., Zhang, Y., Pei, J., Wang, W.: Threshold-based probabilistic top-k dominating queries. The VLDB Journal 19(2), 283–305 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China,MOE, China
Jinchuan Chen & Xiaoyong Du
School of Information, Renmin University of China, China
Min Zhang & Xiaoyong Du
Department of Computer Science, Aalborg University, Denmark
Xike Xie

Authors

Jinchuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Min Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xike Xie
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Binghamton University, 13902, Binghamton, NY, USA
Weiyi Meng
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Ling Feng
Department of Computer Science, National University of Singapore, 117417, Singapore
Stéphane Bressan
Research Group Data Analystics and Computing, University of Vienna, 1090, Vienna, Austria
Werner Winiwarter
School of Computer, Wuhan University, 430072, Wuhan, China
Wei Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Zhang, M., Xie, X., Du, X. (2013). Efficient Querying of Correlated Uncertain Data with Cached Results. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7825. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37487-6_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-37487-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37486-9
Online ISBN: 978-3-642-37487-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics