Skip to main content

Improving Generalization Ability of Classifier with Multiple Imputation Techniques

  • Conference paper
Wireless Networks and Computational Intelligence (ICIP 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 292))

Included in the following conference series:

  • 1757 Accesses

Abstract

A main objective of research in machine learning is to learn to identify complex patterns and make intelligent decisions based on that data automatically; but the set of all possible behaviors is too large to be covered by the set of available training data. Hence it is desirable that learner must generalize from the given examples, so that it can provide a useful output in new cases; otherwise the number of training instances available in training data must be sufficient. We can have another case where the data set may contain some instances with multiple missing attributes, these instances need to be deleted; in such case sufficient data samples are required to improve generalization ability of the classifier. The proposed algorithm generates additional training instances and adds it to original training data to improve generalization ability of the decision tree classifiers. The proposed algorithm imputes missing attribute values with domain values and thus generates additional training instances. The proposed method is permutation and combination based multiple imputation method and it is also useful for imputation of missing data. The proposed method demonstrates good generalization ability on decision trees. This paper proposes a new method for imputation of missing data and same method is used to generate additional data instances to generalize the decision tree learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alpaydin, E.: Introduction to Machine Learning. MIT Press (2004)

    Google Scholar 

  2. Bengio, Y., Delalleau, O., Simard, C.: Decision Trees do not Generalize to New Variations. Computational Intelligence 26(4), 449–467 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  3. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. John Wiley and Sons, New York (1987)

    MATH  Google Scholar 

  4. Schafer, J.L., Graham, J.W.: Missing data: Ourview of the State of the Art. Psychology Methods 7(2), 147–177 (2002)

    Article  Google Scholar 

  5. Zhou, Z.-H., Jiang, Y.: NeC4.5.: Neural Ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering 16(6), 770–773 (2004)

    Article  MathSciNet  Google Scholar 

  6. Kuligowski, R.J., Barros, A.P.: Using Artificial Neural Networks to Estimate Missing Rainfall Data. Journal AWRA 34(6), 14 (1998)

    Google Scholar 

  7. Brockmeier, L.L., Kromrey, J.D., Hines, C.V.: Systematically Missing Data and Multiple Regression Analysis: An Empirical Comparison of Deletion and Imputation Techniques. Multiple Linear Regression Viewpoints 25, 20–39 (1998)

    Google Scholar 

  8. Abebe, A.J., Solomatine, D.P., Venneker, R.G.W.: Application of Adaptive Fuzzy Rule-Based Models for Reconstruction of Missing Precipitation Events. Hydrological Sciences Journal 45(3), 425–436 (2000)

    Article  Google Scholar 

  9. Sinharay, S., Stern, H.S., Russell, D.: The Use of Multiple Imputations for the Analysis of Missing Data. Psychological Methods 4, 317–329 (2001)

    Article  Google Scholar 

  10. Khalil, K., Panu, M., Lennox, W.C.: Groups and Neural Networks Based Stream Flow Data Infilling Procedures. Journal of Hydrology 241, 153–176 (2001)

    Article  Google Scholar 

  11. Bhattacharya, B., Shrestha, D.L., Solomatine, D.P.: Neural Networks in Reconstructing Missing Wave Data in Sedimentation Modeling. In: Proceedings of 30th IAHR Congress, Thessaloniki, Greece Congress, pp. 24–29 (2003)

    Google Scholar 

  12. Fessant, F., Midenet, S.: Self-organizing Map for Data Imputation and Correction in Surveys. Neural Computation Applications 10, 300–310 (2002)

    Article  MATH  Google Scholar 

  13. Musil, C.M., Warner, C.B., Yobas, P.K., Jones, S.L.: A Comparison of Imputation Techniques for Handling Missing Data. Weston Journal of Nursing Research 24(7), 815–829 (2002)

    Article  Google Scholar 

  14. Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for Imputation of Missing Values in Air Quality Data Sets. Atoms, Environment 38, 2895–2907 (2004)

    Article  Google Scholar 

  15. Subasi, M., Subasi, E., Hammer, P.L.: New Imputation Method for Incomplete Binary Data. Rutcor Research Report (August 2009)

    Google Scholar 

  16. Kalteh, A.M., Hjorth, P.: Imputation of Missing values in Precipitation-Runoff Process Database. Journal of Hydrology Research 40(4), 420–432 (2009)

    Article  Google Scholar 

  17. Papagelis, A., Kalles, D.: GAtree: Genetically Evolved Decision Trees. In: Proceedings of the 12th International Conference on Tools with Artificial Intelligence, vol. 13-15, pp. 203–206 (2000)

    Google Scholar 

  18. Rajasekaran, G.A., Pai, V.: Neural Networks Fuzzy Logic and Genetic Algorithms Synthesis and Applications. Prentice-Hall of India (2004)

    Google Scholar 

  19. Quinlan, J.R.: C4.5.: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)

    Google Scholar 

  20. Ruggieri, S.: Efficient C4.5. IEEE Transaction on Knowledge and Data Engineering 14(2) (March/April 2002)

    Google Scholar 

  21. Quinlan, J.R.: Decision Trees and Decision making. IEEE Transaction on Systems, Man, and Cybernetics 20(2) (March/April 1990)

    Google Scholar 

  22. Fu, Z., Mae, F.: A Computational Study of Using Genetic Algorithms to Develop Intelligent Decision Trees. In: Proceedings of the IEEE Congress on Evolutionary Computation (2001)

    Google Scholar 

  23. Oates, T., Jensen, D.: The Effect of Training Set Size on Decision Tree Complexity. In: Proceedings of the 14th International Conference on Machine Learning, pp. 254–262 (1997)

    Google Scholar 

  24. Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2010), http://archive.ics.uci.edu/ml

    Google Scholar 

  25. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Patil, D.V., Bichkar, R.S. (2012). Improving Generalization Ability of Classifier with Multiple Imputation Techniques. In: Venugopal, K.R., Patnaik, L.M. (eds) Wireless Networks and Computational Intelligence. ICIP 2012. Communications in Computer and Information Science, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31686-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31686-9_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31685-2

  • Online ISBN: 978-3-642-31686-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics