Skip to main content

Robust Naive Bayes Combination of Multiple Classifications

  • Conference paper
  • First Online:
The Impact of Applications on Mathematics

Part of the book series: Mathematics for Industry ((MFI,volume 1))

Abstract

When we face new complex classification tasks, since it is difficult to design a good feature set for observed raw data, we often obtain an unsatisfactorily biased classifier. Namely, the trained classifier can only successfully classify certain classes of samples owing to its poor feature set. To tackle the problem, we propose a robust naive Bayes combination scheme in which we effectively combine classifier predictions that we obtained from different classifiers and/or different feature sets. Since we assume that the multiple classifier predictions are given, any type of classifier and any feature set are available in our scheme. In our combination scheme each prediction is regarded as an independent realization of a categorical random variable (i.e., class label) and a naive Bayes model is trained by using a set of the predictions within a supervised learning framework. The key feature of our scheme is the introduction of a class-specific variable selection mechanism to avoid overfitting to poor classifier predictions. We demonstrate the practical benefit of our simple combination scheme with both synthetic and real data sets, and show that it can achieve much higher classification accuracy than conventional ensemble classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bao, L., Intille, S.: Activity recognition from user-annotated acceleration data. In: Proceedings of International Conference on Pervasive Computing, Pervasive 2004, pp. 1–17. Springer, (2004)

    Google Scholar 

  2. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. University of California, Department of Information and Computer Science, Irvine (1998)

    Google Scholar 

  3. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  4. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  5. Dawid, A., Skene, A.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Appl. Stat. 28, 20–28 (1979)

    Article  Google Scholar 

  6. Dietterich, T.G.: Ensemble methods in machine learning. In: Proceedings of the First International Workshop on Multiple Classifier Systems, pp. 1–15. Springer, London (2000)

    Google Scholar 

  7. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of International Conference on Machine Learning ICML96, pp. 148–156 (1996)

    Google Scholar 

  8. Fu, Q., Banerjee, A.: Bayesian overlapping subspace clustering. In: Proceedings of International Conference on Data Mining, ICDM2009 (2009)

    Google Scholar 

  9. Guan, Y., Dy, J.G., Jordan, M.I.: A unified probabilistic model for global and local unsupervised feature selection. In: Proceedings of International Conference on Machine Learning ICML2011 (2011)

    Google Scholar 

  10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  11. Hastie, T., Tibshirani, T., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction (2009)

    Google Scholar 

  12. Hoff, P.D.: Subset clustering of binary sequences, with an application to genomic abnormality data. Biometrics 61(4), 1027–1036 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  13. Hsu, C., Chang, C., Lin, C.: A practical guide to support vector classification. http://www.csie.ntu.edu.tw/cjlin (2010)

  14. Kim, H.C., Ghahramani, Z.: Bayesian classifier combination. In: Proceedings of International Conference on Artificial Intelligence and Statistcs, AISTATS2012. http://www.aistats.org/papers.php (2012)

  15. Lanckriet, G.R.G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5, 27–72 (2004)

    MATH  Google Scholar 

  16. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  17. Shan, H., Banerjee, A.: Bayesian co-clustering. In: Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 530–539 ( 2008)

    Google Scholar 

  18. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc. 58(1), 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  19. Whitehil, J., Ruvolo, P., Wu, T., Bergsma, L., Movellan, J.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. Advances in Neural Information Processing Systems, NIPS2009 (2009)

    Google Scholar 

  20. Wolpert, D.H.: Stacked generalization. Neural Netw. 5, 241–259 (1992)

    Article  Google Scholar 

Download references

Acknowledgments

This research is supported by FIRST program. The authors would like to appreciate the cooperation for experiment by staff of Saiseikai Kumamoto Hospital, Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naonori Ueda .

Editor information

Editors and Affiliations

Appendix

Appendix

Derivations of Eqs. (3) and (4) are as follows. Equation (3) can be derived as follows:

$$\begin{aligned} P(\mathrm C |\mathrm R ;\mathbf{\alpha },\mathbf{\beta })&= \int P(\mathrm C | \mathrm R ,\mathbf{\phi },\varTheta ) p(\mathbf{\phi };\mathbf{\alpha })p(\varTheta ;\mathbf{\beta }) d\mathbf{\phi }d\varTheta \\&= \left( \prod _{k=1}^K \prod _{j=1}^J \frac{\varGamma (\sum _{l} \beta _{k,j,l})}{\prod _{l'} \varGamma (\beta _{k,j,l'})} \int \prod _{l=1}^K (\theta _{k,j,l})^{r_{k,j} n_{k,j,l}+\beta _{k,j,l}-1} d\theta _{k,j,l}\right) \\&\quad \times \, \frac{\varGamma (\sum _{l'}\alpha _{l'})}{\prod _{l'}\varGamma (\alpha _{l'})} \int \prod _{l=1}^K \phi _l^{\sum _k \sum _j (1-r_{k,j})n_{k,j,l}+\alpha _l-1}d\phi _l \\&= \left( \frac{\varGamma (\alpha _{\bullet })}{\prod _l\varGamma (\alpha _l)} \frac{\prod _l\varGamma (\sum _{k,j}\delta (r_{k,j},0)n_{k,j,l}+\alpha _l)}{\varGamma (\sum _{k,j}\delta (r_{k,j},0)N_k+\alpha _{\bullet })}\right) \\&\quad \times \, \left( \prod _{k=1}^K \prod _{j=1}^J \frac{\varGamma (\beta _{k,j,\bullet })}{\prod _l\varGamma (\beta _{k,j,l})} \frac{\prod _l \varGamma (r_{k,j}n_{k,j,l}+\beta _{k,j,l})}{\varGamma (r_{k,j}N_k+\beta _{j,\bullet })}\right) . \end{aligned}$$

Here, \(B(x,y)\) is the beta function. We used the definition \(B(x,y)=\varGamma (x)\varGamma (y)/ \varGamma (x+y)\). In a similar manner, Eq. (4) can be derived as follows:

$$\begin{aligned} P(\mathrm R ;a,b)&= \prod _{k=1}^K \prod _{j=1}^J \int P(r_{k,j};\lambda )p(\lambda ;a,b)d\lambda \\&= \int \lambda ^{\sum _k \sum _j r_{k,j} + a - 1} (1-\lambda )^{\sum _k \sum _j (1-r_{k,j})+b-1} d\lambda / B(a,b)\\&= \frac{B\left( \sum _k \sum _j r_{k,j}+a, \sum _k \sum _j (1-r_{k,j})+b\right) }{B(a,b)}\\&= \frac{\varGamma (\sum _k \sum _j r_{k,j}+a) \varGamma (\sum _k \sum _j (1-r_{k,j})+b)}{\varGamma (KJ+a+b)} \cdot \frac{\varGamma (a,b)}{\varGamma (a)\varGamma (b)} \\&= \frac{\varGamma (\sum _k \sum _j r_{k,j}+a) \varGamma (\sum _k \sum _j (1-r_{k,j})+b)\varGamma (a+b)}{\varGamma (KJ+a+b)\varGamma (a)\varGamma (b)}. \end{aligned}$$

Here, we used another definition of the beta function: \(B(s,t)=\int x^{s-1}(1-x)^{t-1}dx\).

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Japan

About this paper

Cite this paper

Ueda, N., Tanaka, Y., Fujino, A. (2014). Robust Naive Bayes Combination of Multiple Classifications. In: Wakayama, M., et al. The Impact of Applications on Mathematics. Mathematics for Industry, vol 1. Springer, Tokyo. https://doi.org/10.1007/978-4-431-54907-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-4-431-54907-9_10

  • Published:

  • Publisher Name: Springer, Tokyo

  • Print ISBN: 978-4-431-54906-2

  • Online ISBN: 978-4-431-54907-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics