Skip to main content

A Truly Dynamic Data Structure for Top-k Queries on Uncertain Data

  • Conference paper
Scientific and Statistical Database Management (SSDBM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6809))

  • 1495 Accesses

Abstract

Top-k queries allow end-users to focus on the most important (top-k) answers amongst those which satisfy the query. In traditional databases, a user defined score function assigns a score value to each tuple and a top-k query returns k tuples with the highest score. In uncertain database, top-k answer depends not only on the scores but also on the membership probabilities of tuples. Several top-k definitions covering different aspects of score-probability interplay have been proposed in recent past [20, 13, 6, 18]. Most of the existing work in this research field is focused on developing efficient algorithms for answering top-k queries on static uncertain data. Any change (insertion, deletion of a tuple or change in membership probability, score of a tuple) in underlying data forces re-computation of query answers. Such re-computations are not practical considering the dynamic nature of data in many applications. In this paper, we propose a truly dynamic data structure that uses ranking function PRF e(α) proposed by Li et al. [18] under the generally adopted model of x-relations [21]. PRF e can effectively approximate various other top-k definitions on uncertain data based on the value of parameter α. An x-relation consists of a number of x-tuples, where x-tuple is a set of mutually exclusive tuples (up to a constant number) called alternatives. Each x-tuple in a relation randomly instantiates into one tuple from its alternatives. For an uncertain relation with N tuples, our structure can answer top-k queries in O(klogN) time, handles an update in O(logN) time and takes O(N) space. Finally, we evaluate practical efficiency of our structure on both synthetic and real data.

This work is supported in part by US NSF Grant CCF–1017623 (R. Shah).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chakrabarti, A., Cormode, G., McGregor, A.: Robust lower bounds for communication and stream computation. In: STOC, pp. 641–650 (2008)

    Google Scholar 

  2. Chakrabarti, A., Jayram, T.S., Patrascu, M.: Tight lower bounds for selection in randomly ordered streams. In: SODA, pp. 720–729 (2008)

    Google Scholar 

  3. Chaudhuri, S., Ganjam, K., Ganti, V., Motwani, R.: Robust and efficient fuzzy match for online data cleaning. In: SIGMOD Conference, pp. 313–324 (2003)

    Google Scholar 

  4. Chen, J., Yi, K.: Dynamic structures for top-k queries on uncertain data. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 427–438. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: SIGMOD Conference, pp. 551–562 (2003)

    Google Scholar 

  6. Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)

    Google Scholar 

  7. Dalvi, N.N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB, pp. 864–875 (2004)

    Google Scholar 

  8. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-driven data acquisition in sensor networks. In: VLDB, pp. 588–599 (2004)

    Google Scholar 

  9. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: SODA, pp. 28–36 (2003)

    Google Scholar 

  10. Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.A.: Declarative data cleaning: Language, model, and algorithms. In: VLDB, pp. 371–380 (2001)

    Google Scholar 

  11. Guha, S., McGregor, A.: Approximate quantiles and the order of the stream. In: PODS, pp. 273–279 (2006)

    Google Scholar 

  12. Halevy, A.Y., Rajaraman, A., Ordille, J.J.: Data integration: The teenage years. In: VLDB, pp. 9–16 (2006)

    Google Scholar 

  13. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD Conference, pp. 673–686 (2008)

    Google Scholar 

  14. Huang, J., Antova, L., Koch, C., Olteanu, D.: MayBMS: a probabilistic database management system. In: SIGMOD Conference, pp. 1071–1074 (2009)

    Google Scholar 

  15. Jestes, J., Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data. IEEE Transactions on Knowledge and Data Engineering PP(99), 1 (2010)

    Article  Google Scholar 

  16. Jin, C., Yi, K., Chen, L., Yu, J.X., Lin, X.: Sliding-window top-k queries on uncertain streams. PVLDB 1(1), 301–312 (2008)

    Google Scholar 

  17. Koch, C., Olteanu, D.: Conditioning probabilistic databases. In: Proc. VLDB Endow., vol. 1, pp. 313–325 (2008)

    Google Scholar 

  18. Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. PVLDB 2(1) (2009)

    Google Scholar 

  19. Sen, P., Deshpande, A., Getoor, L.: Prdb: managing and exploiting rich correlations in probabilistic databases. VLDB J. 18(5), 1065–1090 (2009)

    Article  Google Scholar 

  20. Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)

    Google Scholar 

  21. Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR, pp. 262–276 (2005)

    Google Scholar 

  22. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases. In: ICDE, pp. 1406–1408 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Patil, M., Shah, R., Thankachan, S.V. (2011). A Truly Dynamic Data Structure for Top-k Queries on Uncertain Data. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22351-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22350-1

  • Online ISBN: 978-3-642-22351-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics