Abstract
In this paper we propose an extension of the naïve Bayes classification method to the multi-relational setting. In this setting, training data are stored in several tables related by foreign key constraints and each example is represented by a set of related tuples rather than a single row as in the classical data mining setting. This work is characterized by three aspects. First, an integrated approach in the computation of the posterior probabilities for each class that make use of first order classification rules. Second, the applicability to both discrete and continuous attributes by means a supervised discretization. Third, the consideration of knowledge on the data model embedded in the database schema during the generation of classification rules. The proposed method has been implemented in the new system Mr-SBC, which is tightly integrated with a relational DBMS. Testing has been performed on two datasets and four benchmark tasks. Results on predictive accuracy and efficiency are in favour of Mr-SBC for the most complex tasks.
Chapter PDF
Similar content being viewed by others
Keywords
- Classification Rule
- Bayesian Classifier
- Inductive Logic Programming
- Naive Bayesian Classifier
- Data Mining System
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Blockeel, H.: Top-down induction of first order logical decision trees. PhD dissertation, Department of Computer Science, Katholieke Universiteit Leuven (1998)
De Raedt, L.: Attribute-value learning versus Inductive Logic Programming: the Missing Links (Extended Abstract). In: Page, D.L. (ed.) ILP 1998. LNCS, vol. 1446, Springer, Heidelberg (1998)
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2-3), 103–130 (1997)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proc of 12th International Conference, pp. 194–202. Morgan Kaufmann, San Francisco (1995)
Dzeroski, S., et al.: Experiments in predicting biodegradability. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 80–91. Springer, Heidelberg (1999)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. Of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1994)
Flach, P.A., Lachiche, N.: Decomposing probability distributions on structured individuals. In: Brito, P., Costa, J., Malerba, D. (eds.) Proceedings of the ECML 2000 workshop on Dealing with Structured Data in Machine Learning and Statistics, Barcelona, Spain, May 2000, pp. 33–43 (2000)
Flach, P.A., Lachiche, N.: Confirmation-guided discovery of first-order rules with Tertius. Machine Learning (2000)
Flach, P., Lachiche, N.: First-order Bayesian Classification with 1BC, Submitted Downloadable from http://hydria.u-strasbg.fr/~lachiche/1BC.ps.gz
Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the 6th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco (1999)
Getoor, L.: Multi-relational data mining using probabilistic relational models: research summary. In: Knobbe, A.J., van der Wallen, D.M.G. (eds.) Proceedings of the First Workshop in Multi-relational Data Mining (2001)
Getoor, L., Koller, D., Taskar, B.: Statistical models for relational data. In: Proceedings of the KDD-2002 Workshop on Multi-Relational Data Mining, Edmonton, CA, pp. 36–55 (2002)
Getoor, L.: Learning Statistical Models from Relational Data, Ph.D. Thesis, Stanford University (December 2001)
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–90 (1993)
Krogel, M., Wrobel, S.: Transformation-Based Learning Using Multirelational Aggregation. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, Springer, Heidelberg (2001)
Lachiche, N., Flach, P.A.: 1BC2: A true first-order bayesian classifier. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 133–148. Springer, Heidelberg (2003)
Leiva, H.A.: MRDTL: A multi-relational decision tree learning algorithm. Master thesis, University of Iowa, USA (2002)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Muggleton, S.H., Bain, M., Hayes-Michie, J., Michie, D.: An experimental comparison of human and machine learning formalisms. In: Proc. Sixth International Workshop on Machine Learning, pp. 113–118. Morgan Kaufmann, San Mateo (1989)
Pompe, U., Kononenko, I.: Naive Bayesian classifier within ILP-R. In: De Raedt, L. (ed.) Proc. of the 5th Int. Workshop on Inductive Logic Programming, pp. 417–436. Dept. of Computer Science, Katholieke Universiteit Leuven (1995)
Pompe, U., Kononenko, I.: Linear space induction in first order logic with relief. In: Kruse, R., Viertl, R., Della Riccia, G. (eds.) CISM Lecture Notes, Udine Italy (1994)
Srinivasan, A., King, R.D., Muggleton, S.: The role of background knowledge: using a problem from chemistry to examine the performance of an ILP program. Technical Report PRG-TR-08-99, Oxford University ComputingLaboratory, Oxford (1999)
Wrobel, S.: Inductive logic programming for knowledge discovery in databases. In: Deroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 74–101. Springer, Berlin (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ceci, M., Appice, A., Malerba, D. (2003). Mr-SBC: A Multi-relational Naïve Bayes Classifier. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-39804-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive