A Truly Dynamic Data Structure for Top-k Queries on Uncertain Data

Patil, Manish; Shah, Rahul; Thankachan, Sharma V.

doi:10.1007/978-3-642-22351-8_6

Manish Patil¹⁹,
Rahul Shah¹⁹ &
Sharma V. Thankachan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6809))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1495 Accesses

Abstract

Top-k queries allow end-users to focus on the most important (top-k) answers amongst those which satisfy the query. In traditional databases, a user defined score function assigns a score value to each tuple and a top-k query returns k tuples with the highest score. In uncertain database, top-k answer depends not only on the scores but also on the membership probabilities of tuples. Several top-k definitions covering different aspects of score-probability interplay have been proposed in recent past [20, 13, 6, 18]. Most of the existing work in this research field is focused on developing efficient algorithms for answering top-k queries on static uncertain data. Any change (insertion, deletion of a tuple or change in membership probability, score of a tuple) in underlying data forces re-computation of query answers. Such re-computations are not practical considering the dynamic nature of data in many applications. In this paper, we propose a truly dynamic data structure that uses ranking function PRF ^e(α) proposed by Li et al. [18] under the generally adopted model of x-relations [21]. PRF ^e can effectively approximate various other top-k definitions on uncertain data based on the value of parameter α. An x-relation consists of a number of x-tuples, where x-tuple is a set of mutually exclusive tuples (up to a constant number) called alternatives. Each x-tuple in a relation randomly instantiates into one tuple from its alternatives. For an uncertain relation with N tuples, our structure can answer top-k queries in O(klogN) time, handles an update in O(logN) time and takes O(N) space. Finally, we evaluate practical efficiency of our structure on both synthetic and real data.

This work is supported in part by US NSF Grant CCF–1017623 (R. Shah).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chakrabarti, A., Cormode, G., McGregor, A.: Robust lower bounds for communication and stream computation. In: STOC, pp. 641–650 (2008)
Google Scholar
Chakrabarti, A., Jayram, T.S., Patrascu, M.: Tight lower bounds for selection in randomly ordered streams. In: SODA, pp. 720–729 (2008)
Google Scholar
Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: SIGMOD Conference, pp. 313–324 (2003)
Google Scholar
Chen, J., Yi, K.: Dynamic structures for top-k queries on uncertain data. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 427–438. Springer, Heidelberg (2007)
Chapter Google Scholar
Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD Conference, pp. 551–562 (2003)
Google Scholar
Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)
Google Scholar
Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB, pp. 864–875 (2004)
Google Scholar
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB, pp. 588–599 (2004)
Google Scholar
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SODA, pp. 28–36 (2003)
Google Scholar
Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.A.: Declarative data cleaning: Language, model, and algorithms. In: VLDB, pp. 371–380 (2001)
Google Scholar
Guha, S., McGregor, A.: Approximate quantiles and the order of the stream. In: PODS, pp. 273–279 (2006)
Google Scholar
Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: The teenage years. In: VLDB, pp. 9–16 (2006)
Google Scholar
Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD Conference, pp. 673–686 (2008)
Google Scholar
Huang, J., Antova, L., Koch, C., Olteanu, D.: MayBMS: a probabilistic database management system. In: SIGMOD Conference, pp. 1071–1074 (2009)
Google Scholar
Jestes, J., Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data. IEEE Transactions on Knowledge and Data Engineering PP(99), 1 (2010)
Article Google Scholar
Jin, C., Yi, K., Chen, L., Yu, J.X., Lin, X.: Sliding-window top-k queries on uncertain streams. PVLDB 1(1), 301–312 (2008)
Google Scholar
Koch, C., Olteanu, D.: Conditioning probabilistic databases. In: Proc. VLDB Endow., vol. 1, pp. 313–325 (2008)
Google Scholar
Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1) (2009)
Google Scholar
Sen, P., Deshpande, A., Getoor, L.: Prdb: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)
Article Google Scholar
Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)
Google Scholar
Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR, pp. 262–276 (2005)
Google Scholar
Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: ICDE, pp. 1406–1408 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Louisiana State University, Baton Rouge, LA, USA
Manish Patil, Rahul Shah & Sharma V. Thankachan

Authors

Manish Patil
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Shah
View author publications
You can also search for this author in PubMed Google Scholar
Sharma V. Thankachan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Evergreen State College, 98505, Olympia, WA, USA
Judith Bayard Cushing
CNRI and University of Virginia, 22908, Charlottesville, VA, USA
James French
Gonzaga University, 99258, Spokane, WA, USA
Shawn Bowers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patil, M., Shah, R., Thankachan, S.V. (2011). A Truly Dynamic Data Structure for Top-k Queries on Uncertain Data. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-22351-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22350-1
Online ISBN: 978-3-642-22351-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics