Uncertainty Management in Scientific Database Systems
Scientific databases often deal with data that comes from multiple sources of varying quality, is heterogeneous, incomplete and inconsistent, and ridden with measurement errors. Uncertainty management deals with a set of techniques for modeling and representing the various uncertainties that arise in scientific data and to enable users to query the data. This entry describes the UII system  that addresses the issue of managing uncertainty in integrating scientific databases.
Distributed data integration is becoming increasingly popular in biomedical research and in scientific research in general. Its popularity is based on the realization that combining sources frequently lead to novel scientific discoveries that cannot be concluded from any single source in isolation. However, as more and more scientific data is shared and as tools are built to provide a common query interface for them, the scientists face the major problem of dealing with...
- 2.Boulos J, Dalvi N, Mandhani B, Mathur S, Re C, Suciu D. Mystiq: a system for finding more answers by using probabilities. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005. p. 891–3.Google Scholar
- 3.Cavallo R, Pittarelli M. The theory of probabilistic databases. In: Proceedings of the 13th International Conference on Very Large Data Bases; 1987. p. 71–81.Google Scholar
- 5.Deshpande A, Sunita Sarawagi. Probabilistic graphical models and their role in databases. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007. p. 1435–6.Google Scholar
- 7.Garofalakis MN, Brown KP, Franklin MJ, Hellerstein JM, Wang DZ, Michelakis E, Tancau L, Wu E, Jeffery SR, Aipperspach R. Probabilistic data management for pervasive computing: The data furnace project. IEEE Data Eng Bull. 2006;29(1):57–63.Google Scholar
- 8.Karger DR. A randomized fully polynomial time approximation scheme for the all terminal network reliability problem. In: Proceedings of the 27th Annual ACM Symposium on Theory of Computing; 1995. p. 11–7.Google Scholar
- 10.Louie B, Detwiler L, Dalvi N, Shaker R, Tarczy-Hornoch P, Suciu D. Incorporating uncertainty metrics into a general-purpose data integration system. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management; 2007. p. 19–28.Google Scholar
- 11.Re C, Dalvi N, Suciu D. Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering; 2007. p. 886–95.Google Scholar
- 12.Sen P, Deshpande A. Representing and querying correlated tuples in probabilistic databases. In: Proceedings of the 23rd International Conference on Data Engineering; 2007. p. 596–605.Google Scholar
- 13.Shaker R, Mork P, Brockenbrough JS, Donelson L, Tarczy-Hornoch P. The biomediator system as a tool for integrating biologic databases on the web. In: Proceedings of the 4th International Workshop on Information Integration on the Web; 2004.Google Scholar
- 14.Singh S, Mayfield C, Mittal S, Prabhakar S, Hambrusch S, Shah R. Orion 2.0: native support for uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1239–42.Google Scholar
- 15.Suciu D, Dalvi N. Foundations of probabilistic answers to queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2005. p. 963.Google Scholar
- 17.Wang K, Tarczy-Hornoch P, Shaker R, Mork P, Brinkley J. Biomediator data integration: beyond genomics to neuroscience data. In: Proceedings of the AMIA Annual Fall Symposium; 2005. p. 779–83.Google Scholar
- 18.Widom J. Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research; 2005.Google Scholar