Abstract
When we face new complex classification tasks, since it is difficult to design a good feature set for observed raw data, we often obtain an unsatisfactorily biased classifier. Namely, the trained classifier can only successfully classify certain classes of samples owing to its poor feature set. To tackle the problem, we propose a robust naive Bayes combination scheme in which we effectively combine classifier predictions that we obtained from different classifiers and/or different feature sets. Since we assume that the multiple classifier predictions are given, any type of classifier and any feature set are available in our scheme. In our combination scheme each prediction is regarded as an independent realization of a categorical random variable (i.e., class label) and a naive Bayes model is trained by using a set of the predictions within a supervised learning framework. The key feature of our scheme is the introduction of a class-specific variable selection mechanism to avoid overfitting to poor classifier predictions. We demonstrate the practical benefit of our simple combination scheme with both synthetic and real data sets, and show that it can achieve much higher classification accuracy than conventional ensemble classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bao, L., Intille, S.: Activity recognition from user-annotated acceleration data. In: Proceedings of International Conference on Pervasive Computing, Pervasive 2004, pp. 1–17. Springer, (2004)
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science, Irvine (1998)
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Dawid, A., Skene, A.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Appl. Stat. 28, 20–28 (1979)
Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems, pp. 1–15. Springer, London (2000)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of International Conference on Machine Learning ICML96, pp. 148–156 (1996)
Fu, Q., Banerjee, A.: Bayesian overlapping subspace clustering. In: Proceedings of International Conference on Data Mining, ICDM2009 (2009)
Guan, Y., Dy, J.G., Jordan, M.I.: A unified probabilistic model for global and local unsupervised feature selection. In: Proceedings of International Conference on Machine Learning ICML2011 (2011)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Hastie, T., Tibshirani, T., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction (2009)
Hoff, P.D.: Subset clustering of binary sequences, with an application to genomic abnormality data. Biometrics 61(4), 1027–1036 (2005)
Hsu, C., Chang, C., Lin, C.: A practical guide to support vector classification. http://www.csie.ntu.edu.tw/cjlin (2010)
Kim, H.C., Ghahramani, Z.: Bayesian classifier combination. In: Proceedings of International Conference on Artificial Intelligence and Statistcs, AISTATS2012. http://www.aistats.org/papers.php (2012)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Shan, H., Banerjee, A.: Bayesian co-clustering. In: Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 530–539 ( 2008)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. 58(1), 267–288 (1996)
Whitehil, J., Ruvolo, P., Wu, T., Bergsma, L., Movellan, J.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. Advances in Neural Information Processing Systems, NIPS2009 (2009)
Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Acknowledgments
This research is supported by FIRST program. The authors would like to appreciate the cooperation for experiment by staff of Saiseikai Kumamoto Hospital, Japan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Derivations of Eqs. (3) and (4) are as follows. Equation (3) can be derived as follows:
Here, \(B(x,y)\) is the beta function. We used the definition \(B(x,y)=\varGamma (x)\varGamma (y)/ \varGamma (x+y)\). In a similar manner, Eq. (4) can be derived as follows:
Here, we used another definition of the beta function: \(B(s,t)=\int x^{s-1}(1-x)^{t-1}dx\).
Rights and permissions
Copyright information
© 2014 Springer Japan
About this paper
Cite this paper
Ueda, N., Tanaka, Y., Fujino, A. (2014). Robust Naive Bayes Combination of Multiple Classifications. In: Wakayama, M., et al. The Impact of Applications on Mathematics. Mathematics for Industry, vol 1. Springer, Tokyo. https://doi.org/10.1007/978-4-431-54907-9_10
Download citation
DOI: https://doi.org/10.1007/978-4-431-54907-9_10
Published:
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-54906-2
Online ISBN: 978-4-431-54907-9
eBook Packages: EngineeringEngineering (R0)