Skip to main content
Log in

Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic skyline model where an uncertain object may take a probability to be in the skyline, and a p-skyline contains all objects whose skyline probabilities are at least p (0 < p ≤ 1). Computing probabilistic skylines on large uncertain data sets is challenging. We develop a bounding-pruning-refining framework and three algorithms systematically. The bottom-up algorithm computes the skyline probabilities of some selected instances of uncertain objects, and uses those instances to prune other instances and uncertain objects effectively. The top-down algorithm recursively partitions the instances of uncertain objects into subsets, and prunes subsets and objects aggressively. Combining the advantages of the bottom-up algorithm and the top-down algorithm, we develop a hybrid algorithm to further improve the performance. Our experimental results on both the real NBA player data set and the benchmark synthetic data sets show that probabilistic skylines are interesting and useful, and our algorithms are efficient on large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  • Abiteboul, S., Kanellakis, P., & Grahne, G. (1987). On the representation and querying of sets of possible worlds. In Proceedings of the 1987 ACM SIGMOD international conference on Management of data (SIGMOD’87) (pp. 34–48). New York: ACM Press.

    Chapter  Google Scholar 

  • Aggarwal, C. C., & Yu, P. S. (2007). A survey of uncertain data algorithms and applications. IBM technical report (RC 24394).

  • Atallah, M. J., & Qi, Y. (2009). Computing all skyline probabilities for uncertain data. In Proceedings of the twenty-eigth ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS (pp. 279–287).

  • Balke, W. T., Güntzer, U., & Zheng, J. X. (2004). Efficient distributed skylining for web information systems. In EDBT 2004, 9th international conference on extending database technology (pp. 256–273).

  • Benjelloun, O., Sarma, A. D., Halevy, A., & Widom, J. (2006). Uldbs: Databases with uncertainty and lineage. In VLDB’2006: Proceedings of the 32nd international conference on very large data bases, VLDB endowment (pp. 953–964).

  • Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM (CACM), 18(9), 509–517.

    Article  MATH  Google Scholar 

  • Bentley, J. L., Kung, H. T., Schkolnick, M., & Thompson, C. D. (1978). On the average number of maxima in a set of vectors and applications. Journal of the ACM, 25(4), 536–543.

    Article  MATH  MathSciNet  Google Scholar 

  • Böhm, C., Fiedler, F., Oswald, A., Plant, C., & Wackersreuther, B. (2009). Probabilistic skyline queries. In Proceedings of the 18th ACM conference on information and knowledge management, CIKM (pp. 651–660).

  • Borzsonyi, S., Kossmann, D., & Stocker, K. (2001). The skyline operator. In Proceedings of 2001 international conferences on data engineering (ICDE’01). Heidelberg, Germany.

  • Burdick, D., Deshpande, P. M., Jayram, T. S., Ramakrishnan, R., & Vaithyanathan, S. (2005). OLAP over uncertain and imprecise data. In VLDB ’05: Proceedings of the 31st international conference on very large data bases, VLDB endowment (pp. 970–981).

  • Chan, C. Y., Eng, P. K., & Tan, K. L. (2005). Stratified computation of skylines with partially-ordered domains. In Proceedings of the 2005 ACM SIGMOD international conference on management of data (SIGMOD) (pp. 203–214).

  • Chan, C. Y., Jagadish, H. V., Tan, K. L., Tung, A. K. H., & Zhang, Z. (2006a). Finding k-dominant skylines in high dimensional space. In Proceedings of the 2006 ACM SIGMOD international conference on management of data (SIGMOD) (pp. 503–514).

  • Chan, C. Y., Jagadish, H. V., Tan, K. L., Tung, A. K. H., & Zhang, Z. (2006b). Finding k-dominant skylines in high dimensional space. In SIGMOD (pp. 503–514). New York: ACM Press.

    Google Scholar 

  • Chan, C. Y., Jagadish, H. V., Tan, K. L., Tung, A. K. H., & Zhang, Z. (2006c). On high dimensional skylines. In 10th international conference on extending database technology (EDBT) (pp. 478–495).

  • Chen, L., & Lian, X. (2008). Dynamic skyline queries in metric spaces. In EDBT (pp. 333–343).

  • Cheng, R., Kalashnikov, D. V., & Prabhakar, S. (2003). Evaluating probabilistic queries over imprecise data. In Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD’03) (pp. 551–562). New York: ACM Press.

    Chapter  Google Scholar 

  • Cheng, R., Xia, Y., Prabhakar, S., Shah, R., & Vitter, J. S. (2004). Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proceedings of 30th international conference on very large data bases (VLDB) (pp. 876–887).

  • Chomicki, J., Godfrey, P., Gryz, J., & Liang, D. (2003). Skyline with presorting. In Proceedings of the 19th international conference on data engineering (ICDE) (pp. 717–816).

  • Dai, X., Yiu, M. L., Mamoulis, N., Tao, Y., & Vaitis, M. (2005). Probabilistic spatial queries on existentially uncertain data. In Proceeding of the 9th international symposium on spatial and temporal databases (SSTD) (pp. 400–417).

  • Dalvi, N. N., & Suciu, D. (2004). Efficient query evaluation on probabilistic databases. In Proceedings of 30th international conference on very large data bases (VLDB) (pp. 864–875).

  • Dalvi, N. N., & Suciu, D. (2007). Management of probabilistic data: Foundations and challenges. In Proceedings of the twenty-sixth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 1–12). New York: ACM Press.

    Chapter  Google Scholar 

  • Dellis, E., & Seeger, B. (2007). Efficient computation of reverse skyline queries. In Proceedings of the 33rd international conference on very large data bases (VLDB) (pp. 291–302).

  • Denis Mindolin, J. C. (2009). Discovering relative importance of skyline attributes. In Proceedings of the 35th international conference on very large data bases (VLDB).

  • Deshpande, A., & Sarawagi, S. (2007). Probabilistic graphical models and their role in databases. In Proceedings of the 33rd international conference on very large data bases (pp. 1435–1436).

  • Godfrey, P., Shipley, R., & Gryz, J. (2005). Maximal vector computation in large data sets. In VLDB. Trondheim, Norway.

  • Guttman, A. (1984). R-tree: A dynamic index structure for spatial searching. In Proc. 1984 ACM-SIGMOD int. conf. management of data (SIGMOD’84) (pp. 47–57). Boston, MA.

  • Huang, Z., Jensen, C. S., Lu, H., & Ooi, B. C. (2006). Skyline queries against mobile lightweight devices in manets. In Proceedings of the 22nd international conference on data engineering (ICDE’06). New York: IEEE.

    Google Scholar 

  • Imielinski, T., & Witold Lipski, J. (1984). Incomplete information in relational databases. Journal of the ACM, 31(4), 761–791.

    Article  MATH  Google Scholar 

  • Jiang, B., & Pei, J. (2009). Online interval skyline queries on time series. In Proceedings of the 25th international conference on data engineering (ICDE’09). Shanghai, China.

  • Jiang, B., Pei, J., Lin, X., Cheung, D. W., & Han, J. (2008). Mining preferences from superior and inferior examples. In KDD ’08: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 390–398). New York: ACM Press.

    Chapter  Google Scholar 

  • Kossmann, D., Ramsak, F., & Rost, S. (2002). Shooting stars in the sky: An online algorithm for skyline queries. In Proc. 2002 int. conf. on very large data bases (VLDB’02). Hong Kong, China.

  • Kriegel, H. P., Kunath, P., Pfeifle, M., & Renz, M. (2006). Probabilistic similarity join on uncertain data. In Proceeding of the 11th international conference on database systems for advanced applications (DASFAA) (pp. 295–309).

  • Kung, H. T., Luccio, F., & Preparata, F. P. (1975). On finding the maxima of a set of vectors. Journal of the ACM, 22(4), 469–476.

    Article  MATH  MathSciNet  Google Scholar 

  • Lian, X., & Chen, L. (2008). Monochromatic and bichromatic reverse skyline search over uncertain databases. In SIGMOD conference (pp. 213–226).

  • Lin, X., Yuan, Y., Wang, W., & Lu, H. (2005). Stabbing the sky: Efficient skyline computation over sliding windows. In Proceedings of the 21st international conference on data engineering (ICDE) (pp. 502–513).

  • Morse, M. D., Patel, J. M., & Grosky, W. I. (2006). Efficient continuous skyline computation. In Proceedings of the 22nd international conference on data engineering (ICDE) (p. 108).

  • Papadias, D., Tao, Y., Fu, G., & Seeger, B. (2003). An optimal and progressive algorithm for skyline queries. In Proceedings of the 2003 ACM SIGMOD international conference on management of data (SIGMOD) (pp. 467–478).

  • Park, S., Kim, T., Park, J., Kim, J., & Im, H. (2009). Parallel skyline computation on multicore architectures. In Proceedings of the 25th international conference on data engineering, ICDE (pp. 760–771).

  • Pei, J., Jin, W., Ester, M., & Tao, Y. (2005). Catching the best views in skyline: A semantic approach. In Proceedings of the 31st international conference on very large data bases (VLDB’05).

  • Pei, J., Fu, A. W. C., Lin, X., & Wang, H. (2007a). Computing compressed skyline cubes efficiently. In Proceedings of the 23rd international conference on data engineering (ICDE’07). IEEE, Istanbul.

  • Pei, J., Jiang, B., Lin, X., & Yuan, Y. (2007b). Probabilistic skylines on uncertain data. In Proceedings of the 33rd international conference on very large data bases (VLDB’07). Viena, Austria.

  • Sacharidis, D., Papadopoulos, S., & Papadias, D. (2009). Topologically sorted skylines for partially ordered domains. In Proceedings of the 25th international conference on data engineering, ICDE (pp. 1072–1083).

  • Sarma, A. D., Benjelloun, O., Halevy, A. Y., & Widom, J. (2006). Working models for uncertain data. In Proceedings of the 22nd international conference on data engineering (ICDE) (p. 7).

  • Sarma, A. D., Lall, A., Nanongkai, D., & Xu, J. (2009). Randomized multi-pass streaming skyline algorithms. In Proceedings of the 35th international conference on very large data bases.

  • Sen, P., Deshpande, A., & Getoor, L. (2007). Representing tuple and attribute uncertainty in probabilistic databases. In Workshops proceedings of the 7th IEEE international conference on data mining (ICDM) (pp. 507–512). Los Alamitos: IEEE Computer Society.

    Chapter  Google Scholar 

  • Sharifzadeh, M., & Shahabi, C. (2006). The spatial skyline queries. In Proceedings of the 32nd international conference on very large data bases (VLDB) (pp. 751–762).

  • Soliman, M. A., Ilyas, I. F., & Chang, K. C. C. (2007). Top-k query processing in uncertain databases. In Proceedings of the 23rd international conference on data engineering (ICDE’07). New York: IEEE.

    Google Scholar 

  • Tan, K. L., Eng, P. K., & Ooi, B. C. (2001). Efficient progressive skyline computation. In Proceedings of 27th international conference on very large data bases (VLDB) (pp. 301–310).

  • Tao, Y., & Papadias, D. (2006). Maintaining sliding window skylines on data streams. IEEE Transactions on Knowledge and Data Engineering, 18(2), 377–391.

    Google Scholar 

  • Tao, Y., Cheng, R., Xiao, X., Ngai, W. K., Kao, B., & Prabhakar, S. (2005). Indexing multi-dimensional uncertain data with arbitrary probability density functions. In Proceedings of 31st international conference on very large data bases (VLDB) (pp. 922–933).

  • Tao, Y., Xiao, X., & Pei, J. (2006). Subsky: Efficient computation of skylines in subspaces. In Proceedings of the 22nd international conference on data engineering (ICDE’06). New York: IEEE.

    Google Scholar 

  • Wong, R. C. W., Pei, J., Fu, A. W. C., & Wang, K. (2007). Mining favorable facets. In KDD ’07: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 804–813). New York: ACM.

    Chapter  Google Scholar 

  • Wu, P., Zhang, C., Feng, Y., Zhao, B. Y., Agrawal, D., & Abbadi, A. E. (2006). Parallelizing skyline queries for scalable distribution. In Proceedings of the 10th international conference on extending database technology (EDBT’06). Munich: Springer.

    Google Scholar 

  • Xia, T., & Zhang, D. (2006). Refreshing the sky: The compressed skycube with efficient support for frequent updates. In Proceedings of the 2006 ACM SIGMOD international conference on management of data (SIGMOD’06) (pp. 491–502). New York: ACM Press.

    Chapter  Google Scholar 

  • Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J. X., & Zhang, Q. (2005). Efficient computation of the skyline cube. In Proceedings of the 31st international conference on very large data bases (VLDB) (pp. 241–252).

  • Zhang, W., Lin, X., Zhang, Y., Wang, W., & Yu, J. X. (2009a). Probabilistic skyline operator over sliding windows. In Proceedings of the 25th international conference on data engineering, ICDE (pp. 1060–1071).

  • Zhang, Z., Cheng, R., Papadias, D., & Tung, A. K. H. (2009b). Minimizing the communication cost for continuous skyline maintenance. In Proceedings of the ACM SIGMOD international conference on management of data. Providence, RI, USA.

  • Zhang, Z., Yang, Y., Cai, R., Papadias, D., & Tung, A. K. H. (2009c). Kernel-based skyline cardinality estimation. In Proceedings of the ACM SIGMOD international conference on management of data.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Jiang.

Additional information

This research is supported in part by an NSERC Discovery Grant, an NSERC Discovery Accelerator Supplement Grant, the ARC Discovery Grants (DP110102937, DP0987557, DP0881035), and a Google research Award. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, B., Pei, J., Lin, X. et al. Probabilistic skylines on uncertain data: model and bounding-pruning-refining methods. J Intell Inf Syst 38, 1–39 (2012). https://doi.org/10.1007/s10844-010-0141-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-010-0141-4

Keywords

Navigation