Survey of Improving Naive Bayes for Classification

Jiang, Liangxiao; Wang, Dianhong; Cai, Zhihua; Yan, Xuesong

doi:10.1007/978-3-540-73871-8_14

Liangxiao Jiang²⁴,
Dianhong Wang²⁵,
Zhihua Cai²⁴ &
…
Xuesong Yan²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4632))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2830 Accesses
52 Citations
1 Altmetric

Abstract

The attribute conditional independence assumption of naive Bayes essentially ignores attribute dependencies and is often violated. On the other hand, although a Bayesian network can represent arbitrary attribute dependencies, learning an optimal Bayesian network classifier from data is intractable. Thus, learning improved naive Bayes has attracted much attention from researchers and presented many effective and efficient improved algorithms. In this paper, we review some of these improved algorithms and single out four main improved approaches: 1) Feature selection; 2) Structure extension; 3) Local learning; 4) Data expansion. We experimentally tested these approaches using the whole 36 UCI data sets selected by Weka, and compared them to naive Bayes. The experimental results show that all these approaches are effective. In the end, we discuss some main directions for future research on Bayesian network classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Langley, P., Iba, W., Thomas, K.: An analysis of Bayesian classifiers. In: Proceedings of the Tenth National Conference of Artificial Intelligence, pp. 223–228. AAAI Press, Stanford (1992)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco, CA (1988)
Google Scholar
Chickering, D.M.: Learning Bayesian networks is NP-Complete. In: Fisher, D., Lenz, H. (eds.) Learning from Data: Artificial Intelligence and Statistics V, pp. 121–130. Springer, Heidelberg (1996)
Chapter Google Scholar
Langley, P., Sage, S.: Induction of selective Bayesian classifiers. In: Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pp. 339-406 (1994)
Google Scholar
Jiang, L., Zhang, H., Cai, Z., Su, J.: Evolutional Naive Bayes. In: Proceedings of the 1st International Symposium on Intelligent Computation and its Applications, ISICA, China University of Geosciences Press, pp.344–350 (2005)
Google Scholar
Kohavi, R., John, G.: Wrappers for Feature Subset Selection. Artificial Intelligence journal, special issue on relevance 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Ratanamahatana, C.A., Gunopulos, D.: Scaling up the Naive Bayesian Classifier: Using Decision Trees for Feature Selection. In: Proceedings of Workshop on Data Cleaning and Preprocessing (DCAP 2002), at IEEE International Conference on Data Mining (ICDM 2002), Maebashi, Japan (2002)
Google Scholar
Friedman, Geiger, Goldszmidt: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)
Article MATH Google Scholar
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. on Information Theory 14, 462–C467 (1968)
Article MATH Google Scholar
Keogh, E., Pazzani, M.: Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches. In: Proceedings of the International Workshop on Artificial Intelligence and Statistics, pp. 225C–230 (1999)
Google Scholar
Zhang, H., Ling, C.X.: An improved learning algorithm for augmented naive Bayes. In: Cheung, D., Williams, G.J., Li, Q. (eds.) PAKDD 2001. LNCS (LNAI), vol. 2035, pp. 581–586. Springer, Heidelberg (2001)
Chapter Google Scholar
Jiang, L., Zhang, H., Cai, Z., Su, J.: One Dependence Augmented Naive Bayes. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 186–194. Springer, Heidelberg (2005)
Chapter Google Scholar
Webb, G.I., Boughton, J., Wang, Z.: Not so naive bayes: Aggregating one-dependence estimators. Machine Learning 58, 5–24 (2005)
Article MATH Google Scholar
Jiang, L., Zhang, H.: Weightily Averaged One-Dependence Estimators. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 970–974. Springer, Heidelberg (2006)
Google Scholar
Zhang, H., Jiang, L., Su, J.: Hidden Naive Bayes. In: AAAI 2005. Proceedings of the 20th National Conference on Artificial Intelligence, pp. 919–924. AAAI Press, Stanford (2005)
Google Scholar
Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: KDD 1996. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207. AAAI Press, Stanford (1996)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)
Google Scholar
Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Naive Bayes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 249–256. Morgan Kaufmann, Seattle (2003)
Google Scholar
Jiang, L., Zhang, H., Su, J.: Instance Cloning Local Naive Bayes. In: Kégl, B., Lapalme, G. (eds.) Canadian AI 2005. LNCS (LNAI), vol. 3501, pp. 280–291. Springer, Heidelberg (2005)
Chapter Google Scholar
Zheng, Z., Webb, G.I.: Lazy Learning of Bayesian Rules. Machine Learning 41(1), 53–84 (2000)
Article MathSciNet Google Scholar
Xie, Z., Hsu, W., Liu, Z., Lee, M.: A Selective Neighborhood Based Naive Bayes for Lazy Learning. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 104–114. Springer, Heidelberg (2002)
Chapter Google Scholar
Jiang, L., Zhang, H., Cai, Z.: Dynamic K-Nearest-Neighbor Naive Bayes with Attribute Weighted. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds.) FSKD 2006. LNCS (LNAI), vol. 4223, pp. 365–368. Springer, Heidelberg (2006)
Chapter Google Scholar
Jiang, L., Guo, Y.: Learning Lazy Naive Bayesian Classifiers for Ranking. In: ICTAI 2005. Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence, pp. 412–416. IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Jiang, L., Zhang, H.: Learning Instance Greedily Cloning Naive Bayes for Ranking. In: ICDM 2005. Proceedings of the 5th IEEE International Conference on Data Mining, pp. 202–209. IEEE Computer Society Press, Los Alamitos (2005)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005), http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
Google Scholar
Merz, C., Murphy, P., Aha, D.: UCI repository of machine learning databases. In: Dept of ICS, University of California, Irvine (1997), http://www.ics.uci.edu/mlearn/MLRepository.html
Nadeau, C., Bengio, Y.: Inference for the generalization error. In: Advances in Neural Information Processing Systems 12, pp. 307–313. MIT Press, Cambridge (1999)
Google Scholar
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)
Article MATH Google Scholar
Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: IJCAI 2003. Proceedings of the International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco (2003)
Google Scholar
Lowd, D., Domingos, P.: Naive Bayes Models for Probability Estimation. In: Proceedings of the Twenty-Second International Conference on Machine Learning, pp. 529–536. ACM Press, New York (2005)
Google Scholar
Jiang, L., Zhang, H.: Learning Naive Bayes for Probability Estimation by Feature Selection. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS (LNAI), vol. 4013, pp. 503–514. Springer, Heidelberg (2006)
Chapter Google Scholar
Grossman, D., Domingos, P.: Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 361–368. ACM Press, Banff, Canada (2004)
Google Scholar
Zhang, H., Su, J.: Naive Bayesian classifiers for ranking. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 501–512. Springer, Heidelberg (2004)
Google Scholar
Zhang, H., Jiang, L., Su, J.: Augmenting Naive Bayes for Ranking. In: ICML 2005. Proceedings of the 22nd International Conference on Machine Learning, pp. 1025–1032. ACM, New York (2005)
Google Scholar
Jiang, L., Zhang, H., Cai, Z.: Discriminatively Improving Naive Bayes by Evolutionary Feature Selection. Romanian Journal of Information Science and Technology 9(3), 163–174 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, China University of Geosciences, Wuhan, Hubei, 430074, P.R. China
Liangxiao Jiang, Zhihua Cai & Xuesong Yan
Faculty of Electronic Engineering, China University of Geosciences, Wuhan, Hubei, 430074, P.R. China
Dianhong Wang

Authors

Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Dianhong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Cai
View author publications
You can also search for this author in PubMed Google Scholar
Xuesong Yan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Calgary , Calgary, AB, Canada
Reda Alhajj
School of Computer Science and Technology , Harbin Institute of Technology, Harbin, China
Hong Gao
School of Computer Science and Technology , Harbin Institute of Technology , Harbin, China
Jianzhong Li
School of Information Technology and Electronic Engineering , The University of Queensland , Queensland, Australia
Xue Li
Department of Computing Science , University of Alberta, Edmonton, AB, Canada
Osmar R. Zaïane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, L., Wang, D., Cai, Z., Yan, X. (2007). Survey of Improving Naive Bayes for Classification. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-73871-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics