Abstract
We are concerned with the processing of data held in distributed heterogeneous databases using domain knowledge, in the form of rules representing high-level knowledge about the data. This process facilitates the handling of missing, conflicting or unacceptable outlying data. In addition, by integrating the processed distributed data, we are able to extract new knowledge at a finer level of granularity than was present in the original data. Once integration has taken place the extracted knowledge, in the form of probabilities, may be used to learn association rules or Bayesian belief networks. Issues of confidentiality and efficiency of transfer of data across networks, whether the Internet or Intranets, are handled by aggregating the native data in situ, typically behind a firewall, and carrying out further transportation and processing solely on multidimensional aggregate tables. Heterogeneity is resolved by utilisation of domain knowledge for harmonisation and integration of the distributed data sources. Integration is carried out by minimisation of the Kullback-Leibler information divergence between the target integrated aggregates and the distributed data values.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Albrecht, J., Lehner, W.: On-line Analytical Processing in Distributed Data Warehouses. IDEAS 1998, 78–85 (1998)
Chen, A.L.P., Tseng, F.S.C.: Evaluating Aggregate Operations over Imprecise Data. IEEE Transactions on Knowledge and Data Engineering, 8273–8284 (1996)
Jiawei, H.: Towards On-Line Analytical Mining in Large Databases. SIGMOD Record 27(1), 97–107 (1998)
McClean, S.I., Scotney, B.W., Shapcott, C.M.: Aggregation of Imprecise and Uncertain Information for Knowledge Discovery in Databases. In: Proceedings of KDD-1998, New York, pp. 269–273 (1998)
McClean, S.I., Scotney, B.W., Shapcott, C.M.: Using Background Knowledge in the Aggregation of Imprecise Evidence in Databases. Data and Knowledge Engineering 32, 131–143 (2000a)
McClean, S.I., Scotney, B.W., Shapcott, C.M.: Incorporating Domain Knowledge into Attribute-Oriented Data Mining. International Journal of Intelligent Systems 6, 535–548 (2000b)
McClean, S.I., Scotney, B.W., Shapcott, M.: Aggregation of Imprecise and Uncertain Information in Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) 13(6), 902–912 (2001)
McClean, S.I., Páircéir, R., Scotney, B.W., Greer, K.R.C.: A Negotiation Agent for Distributed Heterogeneous Statistical Databases. In: Proc. 14th IEEE International Conference on Scientific and Statistical Database Management (SSDBM), pp. 207–216 (2002)
McClean, S.I., Scotney, B.W., Greer, K.R.C.: A Scalable Approach to Integrating Heterogeneous Aggregate Views of Distributed Databases. IEEE Transactions on Knowledge and Data Engineering 15(1), 232–235 (2003)
Parsons, S.: Current Approaches to Handling Imperfect Information in Data and Knowledge Bases. IEEE Transactions on Knowledge and Data Engineering 8, 353–372 (1996)
Scotney, B.W., McClean, S.I., Rodgers, M.C.: Optimal and Efficient Integration of Heterogeneous Summary Tables in a Distributed Database. Data and Knowledge Engineering 29, 337–350 (1999a)
Scotney, B.W., McClean, S.I.: Efficient Knowledge Discovery through the Integration of Heterogeneous Data. Information and Software Technology 41, 569–578 (1999b)
Vardi, Y., Lee, D.: From Image Deblurring to Optimal Investments: Maximum Likelihood Solutions for Positive Linear Inverse Problems (with discussion). J. R. Statist. Soc. B, 569–612 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McClean, S., Scotney, B., Shapcott, M. (2004). Using Domain Knowledge to Learn from Heterogeneous Distributed Databases. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30132-5_28
Download citation
DOI: https://doi.org/10.1007/978-3-540-30132-5_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23318-3
Online ISBN: 978-3-540-30132-5
eBook Packages: Springer Book Archive