Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Uncertainty Management in Scientific Database Systems

  • Nilesh Dalvi
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1302

Definition

Scientific databases often deal with data that comes from multiple sources of varying quality, is heterogeneous, incomplete and inconsistent, and ridden with measurement errors. Uncertainty management deals with a set of techniques for modeling and representing the various uncertainties that arise in scientific data and to enable users to query the data. This entry describes the UII system [10] that addresses the issue of managing uncertainty in integrating scientific databases.

Historical Background

Distributed data integration is becoming increasingly popular in biomedical research and in scientific research in general. Its popularity is based on the realization that combining sources frequently lead to novel scientific discoveries that cannot be concluded from any single source in isolation. However, as more and more scientific data is shared and as tools are built to provide a common query interface for them, the scientists face the major problem of dealing with...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Barbará D, Garcia-Molina H, Porter D. The management of probabilistic data. IEEE Trans Knowl Data Eng. 1992;4(5):487–502.CrossRefGoogle Scholar
  2. 2.
    Boulos J, Dalvi N, Mandhani B, Mathur S, Re C, Suciu D. Mystiq: a system for finding more answers by using probabilities. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005. p. 891–3.Google Scholar
  3. 3.
    Cavallo R, Pittarelli M. The theory of probabilistic databases. In: Proceedings of the 13th International Conference on Very Large Data Bases; 1987. p. 71–81.Google Scholar
  4. 4.
    Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2004. p. 864–75.CrossRefGoogle Scholar
  5. 5.
    Deshpande A, Sunita Sarawagi. Probabilistic graphical models and their role in databases. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007. p. 1435–6.Google Scholar
  6. 6.
    Dey D, Sarkar S. A probabilistic relational model and algebra. ACM Trans Database Syst. 1996;21(3):339–69.MathSciNetCrossRefGoogle Scholar
  7. 7.
    Garofalakis MN, Brown KP, Franklin MJ, Hellerstein JM, Wang DZ, Michelakis E, Tancau L, Wu E, Jeffery SR, Aipperspach R. Probabilistic data management for pervasive computing: The data furnace project. IEEE Data Eng Bull. 2006;29(1):57–63.Google Scholar
  8. 8.
    Karger DR. A randomized fully polynomial time approximation scheme for the all terminal network reliability problem. In: Proceedings of the 27th Annual ACM Symposium on Theory of Computing; 1995. p. 11–7.Google Scholar
  9. 9.
    Lakshmanan LVS, Leone N, Ross R, Subrahmanian VS. Probview: a flexible probabilistic database system. ACM Trans Database Syst. 1997;22(3):419–69.CrossRefGoogle Scholar
  10. 10.
    Louie B, Detwiler L, Dalvi N, Shaker R, Tarczy-Hornoch P, Suciu D. Incorporating uncertainty metrics into a general-purpose data integration system. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management; 2007. p. 19–28.Google Scholar
  11. 11.
    Re C, Dalvi N, Suciu D. Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering; 2007. p. 886–95.Google Scholar
  12. 12.
    Sen P, Deshpande A. Representing and querying correlated tuples in probabilistic databases. In: Proceedings of the 23rd International Conference on Data Engineering; 2007. p. 596–605.Google Scholar
  13. 13.
    Shaker R, Mork P, Brockenbrough JS, Donelson L, Tarczy-Hornoch P. The biomediator system as a tool for integrating biologic databases on the web. In: Proceedings of the 4th International Workshop on Information Integration on the Web; 2004.Google Scholar
  14. 14.
    Singh S, Mayfield C, Mittal S, Prabhakar S, Hambrusch S, Shah R. Orion 2.0: native support for uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1239–42.Google Scholar
  15. 15.
    Suciu D, Dalvi N. Foundations of probabilistic answers to queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005. p. 963.Google Scholar
  16. 16.
    Tatusova TA, Madden TL. Blast 2 sequences - a new tool for comparing protein and nucleotide sequences. FEMS Microbiol Lett. 1999;174(2):247–50.CrossRefGoogle Scholar
  17. 17.
    Wang K, Tarczy-Hornoch P, Shaker R, Mork P, Brinkley J. Biomediator data integration: beyond genomics to neuroscience data. In: Proceedings of the AMIA Annual Fall Symposium; 2005. p. 779–83.Google Scholar
  18. 18.
    Widom J. Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research; 2005.Google Scholar
  19. 19.
    Woods DD, Patterson ES, Roth EM, Christoffersen K. Can we ever escape from data overload? a cognitive systems diagnosis. Cogn Technol Work. 2002;4(1):22–36.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.AirbnbSan FranciscoUSA

Section editors and affiliations

  • Amarnath Gupta
    • 1
  1. 1.San Diego Supercomputer CenterUniv. of California San DiegoLa JollaUSA