Journal of Intelligent Information Systems

, Volume 42, Issue 3, pp 595–618

Bayesian networks for supporting query processing over incomplete autonomous databases

  • Rohit Raghunathan
  • Sushovan De
  • Subbarao Kambhampati
Article

DOI: 10.1007/s10844-013-0277-0

Cite this article as:
Raghunathan, R., De, S. & Kambhampati, S. J Intell Inf Syst (2014) 42: 595. doi:10.1007/s10844-013-0277-0

Abstract

As the information available to naïve users through autonomous data sources continues to increase, mediators become important to ensure that the wealth of information available is tapped effectively. A key challenge that these information mediators need to handle is the varying levels of incompleteness in the underlying databases in terms of missing attribute values. Existing approaches such as QPIAD aim to mine and use Approximate Functional Dependencies (AFDs) to predict and retrieve relevant incomplete tuples. These approaches make independence assumptions about missing values—which critically hobbles their performance when there are tuples containing missing values for multiple correlated attributes. In this paper, we present a principled probabilistic alternative that views an incomplete tuple as defining a distribution over the complete tuples that it stands for. We learn this distribution in terms of Bayesian networks. Our approach involves mining/“learning” Bayesian networks from a sample of the database, and using it to do both imputation (predict a missing value) and query rewriting (retrieve relevant results with incompleteness on the query-constrained attributes, when the data sources are autonomous). We present empirical studies to demonstrate that (i) at higher levels of incompleteness, when multiple attribute values are missing, Bayesian networks do provide a significantly higher classification accuracy and (ii) the relevant possible answers retrieved by the queries reformulated using Bayesian networks provide higher precision and recall than AFDs while keeping query processing costs manageable.

Keywords

Data cleaning Bayesian networks Query rewriting Autonomous database 

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Rohit Raghunathan
    • 1
  • Sushovan De
    • 2
  • Subbarao Kambhampati
    • 2
  1. 1.AmazonSeattleUSA
  2. 2.Computer Science and EngineeringArizona State UniversityTempeUSA

Personalised recommendations