Robust Naive Bayes Combination of Multiple Classifications

Ueda, Naonori; Tanaka, Yusuke; Fujino, Akinori

doi:10.1007/978-4-431-54907-9_10

Naonori Ueda³²,
Yusuke Tanaka³³ &
Akinori Fujino³²

Part of the book series: Mathematics for Industry ((MFI,volume 1))

997 Accesses
2 Citations

Abstract

When we face new complex classification tasks, since it is difficult to design a good feature set for observed raw data, we often obtain an unsatisfactorily biased classifier. Namely, the trained classifier can only successfully classify certain classes of samples owing to its poor feature set. To tackle the problem, we propose a robust naive Bayes combination scheme in which we effectively combine classifier predictions that we obtained from different classifiers and/or different feature sets. Since we assume that the multiple classifier predictions are given, any type of classifier and any feature set are available in our scheme. In our combination scheme each prediction is regarded as an independent realization of a categorical random variable (i.e., class label) and a naive Bayes model is trained by using a set of the predictions within a supervised learning framework. The key feature of our scheme is the introduction of a class-specific variable selection mechanism to avoid overfitting to poor classifier predictions. We demonstrate the practical benefit of our simple combination scheme with both synthetic and real data sets, and show that it can achieve much higher classification accuracy than conventional ensemble classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bao, L., Intille, S.: Activity recognition from user-annotated acceleration data. In: Proceedings of International Conference on Pervasive Computing, Pervasive 2004, pp. 1–17. Springer, (2004)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science, Irvine (1998)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Dawid, A., Skene, A.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Appl. Stat. 28, 20–28 (1979)
Article Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems, pp. 1–15. Springer, London (2000)
Google Scholar
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of International Conference on Machine Learning ICML96, pp. 148–156 (1996)
Google Scholar
Fu, Q., Banerjee, A.: Bayesian overlapping subspace clustering. In: Proceedings of International Conference on Data Mining, ICDM2009 (2009)
Google Scholar
Guan, Y., Dy, J.G., Jordan, M.I.: A unified probabilistic model for global and local unsupervised feature selection. In: Proceedings of International Conference on Machine Learning ICML2011 (2011)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Hastie, T., Tibshirani, T., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction (2009)
Google Scholar
Hoff, P.D.: Subset clustering of binary sequences, with an application to genomic abnormality data. Biometrics 61(4), 1027–1036 (2005)
Article MATH MathSciNet Google Scholar
Hsu, C., Chang, C., Lin, C.: A practical guide to support vector classification. http://www.csie.ntu.edu.tw/cjlin (2010)
Kim, H.C., Ghahramani, Z.: Bayesian classifier combination. In: Proceedings of International Conference on Artificial Intelligence and Statistcs, AISTATS2012. http://www.aistats.org/papers.php (2012)
Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)
MATH Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Shan, H., Banerjee, A.: Bayesian co-clustering. In: Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 530–539 ( 2008)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. 58(1), 267–288 (1996)
MATH MathSciNet Google Scholar
Whitehil, J., Ruvolo, P., Wu, T., Bergsma, L., Movellan, J.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. Advances in Neural Information Processing Systems, NIPS2009 (2009)
Google Scholar
Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)
Article Google Scholar

Download references

Acknowledgments

This research is supported by FIRST program. The authors would like to appreciate the cooperation for experiment by staff of Saiseikai Kumamoto Hospital, Japan.

Author information

Authors and Affiliations

NTT Communication Science Laboratories, 2-4 Hikaridai Seikacho, Sorakugun, Kyoto, Japan
Naonori Ueda & Akinori Fujino
NTT Service Evolution Laboratories, 1-1 Hikarinooka, Yokosuka-shi, Kanagawa, Japan
Yusuke Tanaka

Authors

Naonori Ueda
View author publications
You can also search for this author in PubMed Google Scholar
Yusuke Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Akinori Fujino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naonori Ueda .

Editor information

Editors and Affiliations

Institute for Mathematics and Industry, Kyushu University, Fukuoka, Japan
Masato Wakayama
CSIRO (Commonwealth Scientific and Industrial Research Organisation), Canberra, Aust Capital Terr, Australia
Robert S. Anderssen
School of Mathematical Sciences, Fudan University, Shanghai, China
Jin Cheng
Kyushu University, Fukuoka, Japan
Yasuhide Fukumoto
Institute of Natural and Mathematical Sciences, Massey University, Palmerston North, Auckland, New Zealand
Robert McKibbin
Institute of Mathematics, Free University of Berlin, Berlin, Germany
Konrad Polthier
Kyushu University, Fukuoka, Japan
Tsuyoshi Takagi
Department of Mathematics, National University of Singapore, Singapore, Singapur, Singapore
Kim-Chuan Toh

Appendix

Derivations of Eqs. (3) and (4) are as follows. Equation (3) can be derived as follows:

$$\begin{aligned} P(\mathrm C |\mathrm R ;\mathbf{\alpha },\mathbf{\beta })&= \int P(\mathrm C | \mathrm R ,\mathbf{\phi },\varTheta ) p(\mathbf{\phi };\mathbf{\alpha })p(\varTheta ;\mathbf{\beta }) d\mathbf{\phi }d\varTheta \\&= \left( \prod _{k=1}^K \prod _{j=1}^J \frac{\varGamma (\sum _{l} \beta _{k,j,l})}{\prod _{l'} \varGamma (\beta _{k,j,l'})} \int \prod _{l=1}^K (\theta _{k,j,l})^{r_{k,j} n_{k,j,l}+\beta _{k,j,l}-1} d\theta _{k,j,l}\right) \\&\quad \times \, \frac{\varGamma (\sum _{l'}\alpha _{l'})}{\prod _{l'}\varGamma (\alpha _{l'})} \int \prod _{l=1}^K \phi _l^{\sum _k \sum _j (1-r_{k,j})n_{k,j,l}+\alpha _l-1}d\phi _l \\&= \left( \frac{\varGamma (\alpha _{\bullet })}{\prod _l\varGamma (\alpha _l)} \frac{\prod _l\varGamma (\sum _{k,j}\delta (r_{k,j},0)n_{k,j,l}+\alpha _l)}{\varGamma (\sum _{k,j}\delta (r_{k,j},0)N_k+\alpha _{\bullet })}\right) \\&\quad \times \, \left( \prod _{k=1}^K \prod _{j=1}^J \frac{\varGamma (\beta _{k,j,\bullet })}{\prod _l\varGamma (\beta _{k,j,l})} \frac{\prod _l \varGamma (r_{k,j}n_{k,j,l}+\beta _{k,j,l})}{\varGamma (r_{k,j}N_k+\beta _{j,\bullet })}\right) . \end{aligned}$$

Here, $B(x,y)$ is the beta function. We used the definition $B(x,y)=\varGamma (x)\varGamma (y)/ \varGamma (x+y)$. In a similar manner, Eq. (4) can be derived as follows:

$$\begin{aligned} P(\mathrm R ;a,b)&= \prod _{k=1}^K \prod _{j=1}^J \int P(r_{k,j};\lambda )p(\lambda ;a,b)d\lambda \\&= \int \lambda ^{\sum _k \sum _j r_{k,j} + a - 1} (1-\lambda )^{\sum _k \sum _j (1-r_{k,j})+b-1} d\lambda / B(a,b)\\&= \frac{B\left( \sum _k \sum _j r_{k,j}+a, \sum _k \sum _j (1-r_{k,j})+b\right) }{B(a,b)}\\&= \frac{\varGamma (\sum _k \sum _j r_{k,j}+a) \varGamma (\sum _k \sum _j (1-r_{k,j})+b)}{\varGamma (KJ+a+b)} \cdot \frac{\varGamma (a,b)}{\varGamma (a)\varGamma (b)} \\&= \frac{\varGamma (\sum _k \sum _j r_{k,j}+a) \varGamma (\sum _k \sum _j (1-r_{k,j})+b)\varGamma (a+b)}{\varGamma (KJ+a+b)\varGamma (a)\varGamma (b)}. \end{aligned}$$

Here, we used another definition of the beta function: $B(s,t)=\int x^{s-1}(1-x)^{t-1}dx$.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ueda, N., Tanaka, Y., Fujino, A. (2014). Robust Naive Bayes Combination of Multiple Classifications. In: Wakayama, M., et al. The Impact of Applications on Mathematics. Mathematics for Industry, vol 1. Springer, Tokyo. https://doi.org/10.1007/978-4-431-54907-9_10

Download citation

DOI: https://doi.org/10.1007/978-4-431-54907-9_10
Published: 19 July 2014
Publisher Name: Springer, Tokyo
Print ISBN: 978-4-431-54906-2
Online ISBN: 978-4-431-54907-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Robust Naive Bayes Combination of Multiple Classifications

Abstract

Access this chapter

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation