Formulation of Composite Discrete Measures for Estimating Uncertainties in Probabilistic Databases

  • Susmit BagchiEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 928)


The probabilistic databases contain large datasets embedded with noise and uncertainties in data association rules and queries. The data identification and interpretation in probabilistic databases require probabilistic models for data clustering and query processing. Thus, the associated probability measures are required to be heterogeneous as well as computable. This paper proposes a formal model of composite discrete measures in metric spaces intended to probabilistic databases. The proposed composite measures are computable and cover real as well as complex spaces. The spaces of discrete measures are constructed on continuous smooth functions. This paper presents construction of the formal model and computational evaluations of discrete measures following different functions having varying linearity and smoothness. Furthermore, a special monotone class of the composite discrete measure is presented using analytical formulation. The condensation measure of uniform contraction map is constructed. The proposed model can be employed to computationally estimate uncertainties in probabilistic databases.


Probabilistic databases Metric spaces Probability Measure space Monotone 


  1. 1.
    Barenboim, L., Elkin, M., Pettie, S., Schneider, J.: The locality of distributed symmetry breaking. J. ACM (JACM) 63(3), 20 (2016)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 953–964. VLDB Endowment (2006)Google Scholar
  3. 3.
    Calude, C.S., Hertling, P.H., Jürgensen, H., Weihrauch, K.: Randomness on full shift spaces. Chaos, Solitons & Fractals 12(3), 491–503 (2001)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Chung, K.M., Pettie, S., Su, H.H.: Distributed algorithms for the Lovász local lemma and graph coloring. Distrib. Comput. 30(4), 261–280 (2017)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)CrossRefGoogle Scholar
  6. 6.
    Dubhashi, D., Grable, D.A., Panconesi, A.: Near-optimal, distributed edge colouring via the nibble method. Theor. Comput. Sci. 203(2), 225–252 (1998)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Edalat, A.: The Scott topology induces the weak topology. In: Proceedings of Eleventh Annual IEEE Symposium on Logic in Computer Science, LICS 1996, pp. 372–381. IEEE (1996)Google Scholar
  8. 8.
    Eifler, L.: Open mapping theorems for probability measures on metric spaces. Pac. J. Math. 66(1), 89–97 (1976)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Gács, P.: Uniform test of algorithmic randomness over a general space. Theor. Comput. Sci. 341(1–3), 91–137 (2005)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Haas, P., Jermaine, C.: Database meets simulation: tools and techniques. In: Proceedings of the 2009 INFORMS Simulation Society Research Workshop, Coventry, UK (2009)Google Scholar
  11. 11.
    Hertling, P., Weihrauch, K.: Randomness spaces. In: Larsen, K.G., Skyum, S., Winskel, G. (eds.) ICALP 1998. LNCS, vol. 1443, pp. 796–807. Springer, Heidelberg (1998). Scholar
  12. 12.
    Horváth, Á.: Normally distributed probability measure on the metric space of norms. Acta Mathematica Scientia 33(5), 1231–1242 (2013)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hoyrup, M., Rojas, C.: Computability of probability measures and Martin-Löf randomness over metric spaces. Inf. Comput. 207(7), 830–847 (2009)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C., Haas, P.J.: MCDB: a Monte Carlo approach to managing uncertain data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 687–700. ACM (2008)Google Scholar
  15. 15.
    Jaro, M.A.: Probabilistic linkage of large public health data files. Stat. Med. 14(5–7), 491–498 (1995)CrossRefGoogle Scholar
  16. 16.
    Jibrin, S., Boneh, A., Caron, R.J.: Probabilistic algorithms for extreme point identification. J. Interdiscip. Math. 10(1), 131–142 (2007)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Karp, R.M.: An introduction to randomized algorithms. Discret. Appl. Math. 34(1–3), 165–201 (1991)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Lassaigne, R., Peyronnet, S.: Probabilistic verification and approximation. Ann. Pure Appl. Log. 152(1–3), 122–131 (2008)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Myers, R.B., Herskovic, J.R.: Probabilistic techniques for obtaining accurate patient counts in Clinical Data Warehouses. J. Biomed. Inform. 44, S69–S77 (2011)CrossRefGoogle Scholar
  20. 20.
    Newcombe, H.B.: Handbook of Record Linkage: Methods for Health and Statistical Studies, Administration, and Business. Oxford University Press Inc., Oxford (1988)Google Scholar
  21. 21.
    Nie, L., Li, Z., Qu, W.: Association rules discovery via approximate method from probabilistic database. In: Trustcom/BigDataSE/I SPA 2016 IEEE, pp. 909–914. IEEE (2016)Google Scholar
  22. 22.
    Norman, G.: Analysing randomized distributed algorithms. In: Baier, C., Haverkort, B.R., Hermanns, H., Katoen, J.-P., Siegle, M. (eds.) Validation of Stochastic Systems. LNCS, vol. 2925, pp. 384–418. Springer, Heidelberg (2004). Scholar
  23. 23.
    Parthasarathy, K.R.: Probability Measures on Metric Spaces, vol. 352. American Mathematical Society, Providence (2005)zbMATHGoogle Scholar
  24. 24.
    Repovš, D., Savchenko, A., Zarichnyi, M.: Fuzzy Prokhorov metric on the set of probability measures. Fuzzy Sets Syst. 175(1), 96–104 (2011)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Van Breugel, F., Worrell, J.: A behavioural pseudometric for probabilistic transition systems. Theor. Comput. Sci. 331(1), 115–142 (2005)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Vovk, V., Shen, A.: Prequential randomness and probability. Theor. Comput. Sci. 411(29–30), 2632–2646 (2010)MathSciNetCrossRefGoogle Scholar
  27. 27.
    Zhu, Y., Matsuyama, Y., Ohashi, Y., Setoguchi, S.: When to conduct probabilistic linkage vs. deterministic linkage? A simulation study. J. Biomed. Inform. 56, 80–86 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Aerospace and Software Engineering (Informatics)Gyeongsang National UniversityJinjuSouth Korea

Personalised recommendations