Skip to main content

Semi-naive Bayesian Classification by Weighted Kernel Density Estimation

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7713))

Abstract

Naive Bayes is one of the popular methods for supervised classification. The attribute conditional independence assumption makes Naive Bayes efficient but adversely affects the quality of classification results in many real-world applications. In this paper, a new feature-selection based method is proposed for semi-naive Bayesian classification in order to relax the assumption. A weighted kernel density model is first proposed for Bayesian modeling, which implements a soft feature selection scheme. Then, we propose an efficient algorithm to learn an optimized set of weights for the features, by using the least squares cross-validation method for optimal bandwidth selection. Experimental studies on six real-world datasets show the effectiveness and suitability of the proposed method for efficient Bayesian classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning: Data mining, inference, and prediction. Springer (2001)

    Google Scholar 

  2. Seeger, M.: Bayesian modeling in machine learning: A tutorial review. Tutorial, Saarland University (2006), http://lapmal.epfl.ch/papers/bayes-review.pdf

  3. Zheng, F., Webb, G.: A comparative study of semi-naive bayes methods in classification learning. In: Proceedings of the Australalian Data Mining Workshop, pp. 141–156 (2005)

    Google Scholar 

  4. Wu, X., Kumar, V., et al.: Top 10 algorithms in data mining. Knowledge Information System 14, 1–37 (2008)

    Article  Google Scholar 

  5. Wu, J., Cai, Z.: Learning averaged one-dependence estimators by attribute weighting. Journal of Information and Computational Science 8, 1063–1073 (2011)

    Google Scholar 

  6. Friedman, N., Geiger, D., Goldszmidt, M.: Not so naive bayes: Aggregating one-dependence estimators. Machine Learning 58, 5–24 (2005)

    Article  Google Scholar 

  7. Frank, E., Hall, M., Pfahringer, B.: Locally weighted naive bayes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 249–256 (2003)

    Google Scholar 

  8. Jiang, L., Wang, D., Cai, Z., Yan, X.: Survey of Improving Naive Bayes for Classification. In: Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O.R. (eds.) ADMA 2007. LNCS (LNAI), vol. 4632, pp. 134–145. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Fan, J., Fan, Y.: High-dimensional classification using features annealed independence rules. The Annals of Statistics 36, 2605–2637 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gartner, T., Flach, P.: Wbcsvm: Weighted bayesian classification based on support vector machines. In: Proceedings of the ICML, pp. 154–161 (2001)

    Google Scholar 

  11. Langley, P., Sage, S.: Induction of selective bayesian classifiers. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 399–406 (1994)

    Google Scholar 

  12. Ratanamahatana, C., Gunopulos, D.: Feature selection for the naive bayesian classifier using decision trees. Applied Artificial Intellegence 17, 475–487 (2003)

    Article  Google Scholar 

  13. Kohavi, R., John, G.: Wrappers for feature subset selection. Artificial Intelligence 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  14. Lee, C., Gutierrez, F., Dou, D.: Calculating feature weights in naive bayes with kullback-leibler measure. In: Proceedings of the IEEE ICDM, pp. 1146–1151 (2011)

    Google Scholar 

  15. Qi, L., Racine, J.: Nonparametric econometrics: Theory and practice. Princeton University Press (2007)

    Google Scholar 

  16. Ouyang, D., Li, Q., Racine, J.: Cross-validation and the estimation of probability distributions with categorical data. Nonparametric Statistics 18, 69–100 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  17. Aitchison, J., Aitken, C.: Multivariate binary discrimination by the kernel method. Biometrika 63, 413–420 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  18. Chen, L., Wang, S.: Automated feature weighting in naive bayes for high-dimensional data classification. In: Proceedings of the CIKM (2012)

    Google Scholar 

  19. Chen, L., Jiang, Q., Wang, S.: Model-based method for projective clustering. IEEE Transactions on Knowledge and Data Engineering 24, 1291–1305 (2012)

    Article  Google Scholar 

  20. John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 338–345 (1995)

    Google Scholar 

  21. Hall, M., Frank, E., et al.: The weka data mining software: An update. SIGKDD Explorations 11 (2009)

    Google Scholar 

  22. Minka, T.: Estimating a Dirichlet distribution (2000), http://research.microsoft.com/en-us/um/people/minka/papers/dirichlet/minka-dirichlet.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, L., Wang, S. (2012). Semi-naive Bayesian Classification by Weighted Kernel Density Estimation. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35527-1_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35526-4

  • Online ISBN: 978-3-642-35527-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics