Improving Generalization Ability of Classifier with Multiple Imputation Techniques

Patil, Dipak V.; Bichkar, R. S.

doi:10.1007/978-3-642-31686-9_37

Dipak V. Patil³ &
R. S. Bichkar⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 292))

Included in the following conference series:

International Conference on Information Processing

1757 Accesses

Abstract

A main objective of research in machine learning is to learn to identify complex patterns and make intelligent decisions based on that data automatically; but the set of all possible behaviors is too large to be covered by the set of available training data. Hence it is desirable that learner must generalize from the given examples, so that it can provide a useful output in new cases; otherwise the number of training instances available in training data must be sufficient. We can have another case where the data set may contain some instances with multiple missing attributes, these instances need to be deleted; in such case sufficient data samples are required to improve generalization ability of the classifier. The proposed algorithm generates additional training instances and adds it to original training data to improve generalization ability of the decision tree classifiers. The proposed algorithm imputes missing attribute values with domain values and thus generates additional training instances. The proposed method is permutation and combination based multiple imputation method and it is also useful for imputation of missing data. The proposed method demonstrates good generalization ability on decision trees. This paper proposes a new method for imputation of missing data and same method is used to generate additional data instances to generalize the decision tree learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alpaydin, E.: Introduction to Machine Learning. MIT Press (2004)
Google Scholar
Bengio, Y., Delalleau, O., Simard, C.: Decision Trees do not Generalize to New Variations. Computational Intelligence 26(4), 449–467 (2010)
Article MathSciNet MATH Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. John Wiley and Sons, New York (1987)
MATH Google Scholar
Schafer, J.L., Graham, J.W.: Missing data: Ourview of the State of the Art. Psychology Methods 7(2), 147–177 (2002)
Article Google Scholar
Zhou, Z.-H., Jiang, Y.: NeC4.5.: Neural Ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering 16(6), 770–773 (2004)
Article MathSciNet Google Scholar
Kuligowski, R.J., Barros, A.P.: Using Artificial Neural Networks to Estimate Missing Rainfall Data. Journal AWRA 34(6), 14 (1998)
Google Scholar
Brockmeier, L.L., Kromrey, J.D., Hines, C.V.: Systematically Missing Data and Multiple Regression Analysis: An Empirical Comparison of Deletion and Imputation Techniques. Multiple Linear Regression Viewpoints 25, 20–39 (1998)
Google Scholar
Abebe, A.J., Solomatine, D.P., Venneker, R.G.W.: Application of Adaptive Fuzzy Rule-Based Models for Reconstruction of Missing Precipitation Events. Hydrological Sciences Journal 45(3), 425–436 (2000)
Article Google Scholar
Sinharay, S., Stern, H.S., Russell, D.: The Use of Multiple Imputations for the Analysis of Missing Data. Psychological Methods 4, 317–329 (2001)
Article Google Scholar
Khalil, K., Panu, M., Lennox, W.C.: Groups and Neural Networks Based Stream Flow Data Infilling Procedures. Journal of Hydrology 241, 153–176 (2001)
Article Google Scholar
Bhattacharya, B., Shrestha, D.L., Solomatine, D.P.: Neural Networks in Reconstructing Missing Wave Data in Sedimentation Modeling. In: Proceedings of 30th IAHR Congress, Thessaloniki, Greece Congress, pp. 24–29 (2003)
Google Scholar
Fessant, F., Midenet, S.: Self-organizing Map for Data Imputation and Correction in Surveys. Neural Computation Applications 10, 300–310 (2002)
Article MATH Google Scholar
Musil, C.M., Warner, C.B., Yobas, P.K., Jones, S.L.: A Comparison of Imputation Techniques for Handling Missing Data. Weston Journal of Nursing Research 24(7), 815–829 (2002)
Article Google Scholar
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for Imputation of Missing Values in Air Quality Data Sets. Atoms, Environment 38, 2895–2907 (2004)
Article Google Scholar
Subasi, M., Subasi, E., Hammer, P.L.: New Imputation Method for Incomplete Binary Data. Rutcor Research Report (August 2009)
Google Scholar
Kalteh, A.M., Hjorth, P.: Imputation of Missing values in Precipitation-Runoff Process Database. Journal of Hydrology Research 40(4), 420–432 (2009)
Article Google Scholar
Papagelis, A., Kalles, D.: GAtree: Genetically Evolved Decision Trees. In: Proceedings of the 12th International Conference on Tools with Artificial Intelligence, vol. 13-15, pp. 203–206 (2000)
Google Scholar
Rajasekaran, G.A., Pai, V.: Neural Networks Fuzzy Logic and Genetic Algorithms Synthesis and Applications. Prentice-Hall of India (2004)
Google Scholar
Quinlan, J.R.: C4.5.: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)
Google Scholar
Ruggieri, S.: Efficient C4.5. IEEE Transaction on Knowledge and Data Engineering 14(2) (March/April 2002)
Google Scholar
Quinlan, J.R.: Decision Trees and Decision making. IEEE Transaction on Systems, Man, and Cybernetics 20(2) (March/April 1990)
Google Scholar
Fu, Z., Mae, F.: A Computational Study of Using Genetic Algorithms to Develop Intelligent Decision Trees. In: Proceedings of the IEEE Congress on Evolutionary Computation (2001)
Google Scholar
Oates, T., Jensen, D.: The Effect of Training Set Size on Decision Tree Complexity. In: Proceedings of the 14th International Conference on Machine Learning, pp. 254–262 (1997)
Google Scholar
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2010), http://archive.ics.uci.edu/ml
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Sandip Institute of Technology and Research Centre, Nasik, M.S., India
Dipak V. Patil
Department of Computer Engineering, G.H. Raisoni College of Engineering & Management, Pune, M.S., India
R. S. Bichkar

Authors

Dipak V. Patil
View author publications
You can also search for this author in PubMed Google Scholar
R. S. Bichkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University Visvesvaraya College of Engineering, Bangalore University, 560 001, Bangalore, India
K. R. Venugopal
Indian Institute of Science, Bangalore, India
L. M. Patnaik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patil, D.V., Bichkar, R.S. (2012). Improving Generalization Ability of Classifier with Multiple Imputation Techniques. In: Venugopal, K.R., Patnaik, L.M. (eds) Wireless Networks and Computational Intelligence. ICIP 2012. Communications in Computer and Information Science, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31686-9_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-31686-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31685-2
Online ISBN: 978-3-642-31686-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics