Advertisement

Sensitivity Analysis of Answer Ordering from Probabilistic Databases

  • Jianwen Chen
  • Yiping Li
  • Ling Feng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8056)

Abstract

Queries over probabilistic databases result in probabilistic answers, which are often ranked according to certain ranking criteria. As the probabilities of the basic tuples may be imprecise and erroneous, and their perturbations may lead to great changes in answer ordering, sensitivity analysis like “which basic input probability change can substantially alter the ranked result?”, “which basic probability change will make a certain element top-ranked?”, “which basic probability change will swap the positions of the firstly and secondly ranked elements?” thus arise.

The sensitivity analysis of top-K probabilistic query has been touched in the literature, mainly concerning the change of the answer list as a set. However, the ordering of the elements in the answer list matters highly for certain applications. In this paper, we categorize a variety of such kinds of ordering sensitivity questions into list-oriented or element-oriented, and formulate the sensitivity analysis problem for answer ordering returned from probabilistic top-K queries and probabilistic top-K aggregation queries. We develop a modular approach to quantitatively compute sensitivity of answer ordering, where four basic processing modules are identified. Optimization strategies are also presented for performance improvement. Experimental results on both synthetic and real data demonstrate the effectiveness and efficiency of the proposed solutions.

Keywords

Probabilistic database top-K query aggregation query answer ordering sensitivity analysis 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, P., Benjelloun, O., Sarma, A.D., Hayworth, C., Nabar, S., Sugihara, T., Widom, J.: Trio: A system for data, uncertainty, and lineage. In: VLDB (2006)Google Scholar
  2. 2.
    Bernecker, T., Kriegel, H.-P., Mamoulis, N., Renz, M., Zuefle, A.: Scalable probabilistic similarity ranking in uncertain databases. Transactions on Knowledge and Data Engineering (2010)Google Scholar
  3. 3.
    Beskales, G., Soliman, M., Ilyas, I.: Efficient search for the top-k probable nearest neighbors in uncertain databases. In: VLDB (2008)Google Scholar
  4. 4.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB Journal (2007)Google Scholar
  5. 5.
    Fagin, R., Kumar, R., Sivakuma, D.: Comparing top k lists. SIAM Journal on Discrete Mathematics 17(1) (2003)Google Scholar
  6. 6.
    Jayram, T.S., Kale, S., Vee, E.: Efficient aggregation algorithms for probabilistic data. In: SODA (2007)Google Scholar
  7. 7.
    Kanagal, B., Li, J., Deshpande, A.: Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In: SIGMOD (2011)Google Scholar
  8. 8.
    Lian, X., Chen, L.: Probabilistic inverse ranking queries in uncertain databases. VLDB Journal (2011)Google Scholar
  9. 9.
    Lian, X., Chen, L.: Shooting top-k stars in uncertain databases. The VLDB Journal (2011)Google Scholar
  10. 10.
    Movielens data, http://www.grouplens.data
  11. 11.
    Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)Google Scholar
  12. 12.
    Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: ICDE (2007)Google Scholar
  13. 13.
    Soliman, M.A., Ilyas, I.F., Martinenghi, D.: Ranking with uncertain scoring functions: Semantics and sensitivity measures. In: SIGMOD (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jianwen Chen
    • 1
  • Yiping Li
    • 1
  • Ling Feng
    • 1
  1. 1.Dept. of Computer Science & TechnologyTsinghua UniversityBeijingChina

Personalised recommendations