Skip to main content

Top-k Queries Over Uncertain Scores

  • Conference paper
  • First Online:
On the Move to Meaningful Internet Systems: OTM 2016 Conferences (OTM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10033))

Abstract

Modern recommendation systems leverage some forms of collaborative user or crowd sourced collection of information. For instance, services like TripAdvisor, Airbnb and HungyGoWhere rely on user-generated content to describe and classify hotels, vacation rentals and restaurants. By nature of such independent collection of information, the multiplicity, diversity and varying quality of the information collected result in uncertainty. Objects, such as the services offered by hotels, vacation rentals and restaurants, have uncertain scores for their various features.

In this context, ranking of uncertain data becomes a crucial issue. Several data models for uncertain data and several semantics for probabilistic top-k queries have been proposed in the literature. We consider here a model of objects with uncertain scores given as probability distributions and the semantics proposed by the state of the art reference work of Soliman, Hyas and Ben-David.

In this paper, we explore the design space of Metropolis-Hastings Markov chain Monte Carlo algorithms for answering probabilistic top-k queries over a database of objects with uncertain scores. We are able to devise several algorithms that yield better performance than the reference algorithm. We empirically and comparatively prove the effectiveness and efficiency of these new algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.comp.nus.edu.sg/~liuqing/tech-reports/TRA9-16-Topk.pdf.

References

  1. Amarilli, A., Amsterdamer, Y., Milo, T.: Uncertainty in crowd data sourcing under structural constraints. In: Han, W.-S., Lee, M.L., Muliantara, A., Sanjaya, N.A., Thalheim, B., Zhou, S. (eds.) DASFAA 2014. LNCS, vol. 8505, pp. 351–359. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43984-5_27

    Google Scholar 

  2. Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data and expected ranks. In: ICDE, pp. 305–316 (2009)

    Google Scholar 

  3. Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In ICDT, pp. 225–236 (2013)

    Google Scholar 

  4. Ge, T., Zdonik, S., Madden, S.: Top-k queries on uncertain data: on score distribution and typical answers. In: SIGMOD, pp. 375–388. ACM (2009)

    Google Scholar 

  5. Hua, M., Pei, J., Zhang, W., Lin, X.: Ranking queries on uncertain data: a probabilistic threshold approach. In: SIGMOD, pp. 673–686. ACM (2008)

    Google Scholar 

  6. Jestes, J., Cormode, G., Li, F., Yi, K.: Semantics of ranking queries for probabilistic data. TKDE 23(12), 1903–1917 (2011)

    Google Scholar 

  7. Li, J., Deshpande, A.: Ranking continuous probabilistic datasets. VLDB 3(1–2), 638–649 (2010)

    Google Scholar 

  8. Li, J., Saha, B., Deshpande, A.: A unified approach to ranking in probabilistic databases. VLDB 2(1), 502–513 (2009)

    Google Scholar 

  9. Newman, M.E., Barkema, G.T., Newman, M.: Monte Carlo Methods in Statistical Physics, vol. 13. Clarendon Press, Oxford (1999)

    MATH  Google Scholar 

  10. O’Leary, D.P.: Multidimensional integration: partition and conquer. Comput. Sci. Eng. 6(6), 58–66 (2004)

    Article  Google Scholar 

  11. Re, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)

    Google Scholar 

  12. Soliman, M.A., Ilyas, I.F.: Ranking with uncertain scores. In: ICDE, pp. 317–328 (2009)

    Google Scholar 

  13. Soliman, M.A., Ilyas, I.F., Ben-David, S.: Supporting ranking queries on uncertain and incomplete data. VLDB J. 19(4), 477–501 (2010)

    Article  Google Scholar 

  14. Soliman, M.A., Ilyas, I.F., Chang, KC.-C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)

    Google Scholar 

  15. Wang, C., Yuan, L.Y., You, J.-H., Zaiane, O.R., Pei, J.: On pruning for top-k ranking in uncertain databases. VLDB 4(10), 598–609 (2011)

    Google Scholar 

  16. Yi, K., Li, F., Kollios, G., Srivastava, D.: Efficient processing of top-k queries in uncertain databases with x-relations. TKDE 20(12), 1669–1682 (2008)

    Google Scholar 

  17. Zhang, X., Li, G., Feng, J.: Crowdsourced top-k algorithms: an experimental evaluation. VLDB 9(8), 612–623 (2016)

    Google Scholar 

Download references

Acknowledgement

This research is funded by research grant R-252-000-622-114 by Singapore Ministry of Education Academic Research Fund (project 251RES1607- “Janus: Effective, Efficient and Fair Algorithms for Spatio-temporal Crowdsourcing”) and is a collaboration between the National University of Singapore and Télécom ParisTech.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Liu, Q., Basu, D., Abdessalem, T., Bressan, S. (2016). Top-k Queries Over Uncertain Scores. In: Debruyne, C., et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-48472-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-48471-6

  • Online ISBN: 978-3-319-48472-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics