Bayesian networks for supporting query processing over incomplete autonomous databases

Raghunathan, Rohit; De, Sushovan; Kambhampati, Subbarao

doi:10.1007/s10844-013-0277-0

Bayesian networks for supporting query processing over incomplete autonomous databases

Published: 01 September 2013

Volume 42, pages 595–618, (2014)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Rohit Raghunathan¹,
Sushovan De² &
Subbarao Kambhampati²

284 Accesses
6 Citations
Explore all metrics

Abstract

As the information available to naïve users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as QPIAD aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values—which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this paper, we present a principled probabilistic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. We learn this distribution in terms of Bayesian networks. Our approach involves mining/“learning” Bayesian networks from a sample of the database, and using it to do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). We present empirical studies to demonstrate that (i) at higher levels of incompleteness, when multiple attribute values are missing, Bayesian networks do provide a significantly higher classification accuracy and (ii) the relevant possible answers retrieved by the queries reformulated using Bayesian networks provide higher precision and recall than AFDs while keeping query processing costs manageable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Functional Inferences over Heterogeneous Data

Incomplete Information in RDF

Tracing Errors in Probabilistic Databases Based on the Bayesian Network

Notes

The actual implementation of QPIAD uses a variant to the highest confidence AFD for some of the attributes. For details we refer the reader to Wolf et al. (2009).
In this prototype, we manually transferred the output of the BANJO module to the BNT module. In future systems, we will integrate them programmatically.

References

Batista, G., & Monard, M. (2002). A study of k-nearest neighbour as an imputation method. In Soft computing systems: design, management and applications (pp. 251–260). Santiago, Chile.
Bishop, C., et al. (2006). Pattern recognition and machine learning (Vol. 4). Springer, New York.
Google Scholar
Cars.com (2013). http://www.cars.com. Accessed 1 Feb 2013.
CBioC: (2013) http://cbioc.eas.asu.edu/. Accessed 1 Feb 2013.
Cooper, G. (1990). The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42(2), 393–405.
Article MATH MathSciNet Google Scholar
Dempster, A., Laird, N., Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1), 1–38.
Google Scholar
Fernández, A., Rumí, R., Salmerón, A. (2012). Answering queries in hybrid Bayesian networks using importance sampling. Decision Support Systems, 53(3), 580–590.
Article Google Scholar
Frank, A., & Asuncion, A. (2010). UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 1 Feb 2013.
Geiger, D., Verma, T., Pearl, J. (1990). Identifying independence in Bayesian networks. Networks, 20(5), 507–534.
Article MATH MathSciNet Google Scholar
Gupta, R., & Sarawagi, S. (2006). Creating probabilistic databases from information extraction models. In VLDB (pp. 965–976).
Hartemink, A., et al. (2005) Banjo: Bayesian network inference with java objects. Web Site http://www.cs.duke.edu/~amink/software/banjo/. Accessed 1 Feb 2013.
Heckerman, D. (1992). The certainty-factor model (2nd edn). Encyclopedia of Artificial Intelligence.
Heckerman, D., Geiger, D., Chickering, D.M. (1995). Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learning, 20(3), 197–243.
MATH Google Scholar
Heitjan, D.F., & Basu, S. (1996). Distinguishing missing at random and missing completely at random. The American Statistician, 50(3), 207–213.
MathSciNet Google Scholar
Jensen, F., Olesen, K., Andersen, S. (2006). An algebra of Bayesian belief universes for knowledge-based systems. Networks, 20(5), 637–659.
Article MathSciNet Google Scholar
Jensen, F.V., & Nielsen, T.D. (2007). Bayesian networks and decision graphs. Springer.
Khatri, H. (2006). Query processing over incomplete autonomous web databases. Master’s thesis, Arizona State University, Tempe, USA.
Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
Minka, T., Winn, J., Guiver, J., Knowles, D. (2010). Infer.NET 2.4. http://research.microsoft.com/infernet. Microsoft Research Cambridge. Accessed 1 Feb 2013.
Minka, T.P. (2001). Expectation propagation for approximate Bayesian inference. In UAI (pp. 362–369).
Murphy, K., et al. (2001). The Bayes net toolbox for Matlab. Computing Science and Statistics, 33(2), 1024–1034.
Google Scholar
Muslea, I., & Lee, T. (2005). Online query relaxation via Bayesian causal structures discovery. In Proceedings of the national conference on artificial intelligence (Vol. 20, p. 831). Menlo Park, CA/Cambridge, MA, London: AAAI Press/MIT Press.
Google Scholar
Nambiar, U., & Kambhampati, S. (2006). Answering imprecise queries over autonomous web databases. In Proceedings of the 22nd International Conference on Data Engineering (ICDE) (pp. 45–45). IEEE.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan Kaufmann.
Ramoni, M., & Sebastiani, P. (1997). Learning Bayesian networks from incomplete databases. In Proceedings of the 13th conference on uncertainty in artificial intelligence (pp. 401–408). Morgan Kaufmann Publishers Inc.
Ramoni, M., & Sebastiani, P. (2001). Robust learning with missing data. Machine Learning, 45(2), 147–170.
Article MATH Google Scholar
Romero, V., & Salmerón, A. (2004). Multivariate imputation of qualitative missing data using Bayesian networks. In Soft methodology and random information systems (pp. 605–612). Springer.
Russell, S.J., & Norvig, P. (2010). Artificial intelligence—a modern approach. Pearson Education.
Shortliffe, E. (1976) Computer-based medical consultations: MYCIN (Vol. 388). Elsevier, New York.
Google Scholar
Wolf, G., Kalavagattu, A., Khatri, H., Balakrishnan, R., Chokshi, B., Fan, J., Chen, Y., Kambhampati, S. (2009). Query processing over incomplete autonomous databases: query rewriting using learned data dependencies. Very Large Data Bases Journal, 18(5), 1167–1190.
Article Google Scholar
Wolf, G., Khatri, H., Chokshi, B., Fan, J., Chen, Y., Kambhampati, S. (2007). Query processing over incomplete autonomous databases. In Proceedings of the 33rd international conference on very large data bases (pp. 651–662). VLDB Endowment.
Wu, C., Wun, C., Chou, H. (2004). Using association rules for completing missing data. In Hybrid Intelligent Systems (HIS) (pp. 236–241). IEEE.

Download references

Author information

Authors and Affiliations

Amazon, Seattle, WA, USA
Rohit Raghunathan
Computer Science and Engineering, Arizona State University, Tempe, AZ, USA
Sushovan De & Subbarao Kambhampati

Authors

Rohit Raghunathan
View author publications
You can also search for this author in PubMed Google Scholar
Sushovan De
View author publications
You can also search for this author in PubMed Google Scholar
Subbarao Kambhampati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sushovan De.

Additional information

This research is supported by ONR grant N000140910032 and two Google research awards.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raghunathan, R., De, S. & Kambhampati, S. Bayesian networks for supporting query processing over incomplete autonomous databases. J Intell Inf Syst 42, 595–618 (2014). https://doi.org/10.1007/s10844-013-0277-0

Download citation

Received: 28 August 2012
Revised: 15 August 2013
Accepted: 19 August 2013
Published: 01 September 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10844-013-0277-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian networks for supporting query processing over incomplete autonomous databases

Abstract

Access this article

Similar content being viewed by others

Functional Inferences over Heterogeneous Data

Incomplete Information in RDF

Tracing Errors in Probabilistic Databases Based on the Bayesian Network

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian networks for supporting query processing over incomplete autonomous databases

Abstract

Access this article

Similar content being viewed by others

Functional Inferences over Heterogeneous Data

Incomplete Information in RDF

Tracing Errors in Probabilistic Databases Based on the Bayesian Network

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation