Abstract
A main objective of research in machine learning is to learn to identify complex patterns and make intelligent decisions based on that data automatically; but the set of all possible behaviors is too large to be covered by the set of available training data. Hence it is desirable that learner must generalize from the given examples, so that it can provide a useful output in new cases; otherwise the number of training instances available in training data must be sufficient. We can have another case where the data set may contain some instances with multiple missing attributes, these instances need to be deleted; in such case sufficient data samples are required to improve generalization ability of the classifier. The proposed algorithm generates additional training instances and adds it to original training data to improve generalization ability of the decision tree classifiers. The proposed algorithm imputes missing attribute values with domain values and thus generates additional training instances. The proposed method is permutation and combination based multiple imputation method and it is also useful for imputation of missing data. The proposed method demonstrates good generalization ability on decision trees. This paper proposes a new method for imputation of missing data and same method is used to generate additional data instances to generalize the decision tree learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alpaydin, E.: Introduction to Machine Learning. MIT Press (2004)
Bengio, Y., Delalleau, O., Simard, C.: Decision Trees do not Generalize to New Variations. Computational Intelligence 26(4), 449–467 (2010)
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. John Wiley and Sons, New York (1987)
Schafer, J.L., Graham, J.W.: Missing data: Ourview of the State of the Art. Psychology Methods 7(2), 147–177 (2002)
Zhou, Z.-H., Jiang, Y.: NeC4.5.: Neural Ensemble based C4.5. IEEE Transactions on Knowledge and Data Engineering 16(6), 770–773 (2004)
Kuligowski, R.J., Barros, A.P.: Using Artificial Neural Networks to Estimate Missing Rainfall Data. Journal AWRAÂ 34(6), 14 (1998)
Brockmeier, L.L., Kromrey, J.D., Hines, C.V.: Systematically Missing Data and Multiple Regression Analysis: An Empirical Comparison of Deletion and Imputation Techniques. Multiple Linear Regression Viewpoints 25, 20–39 (1998)
Abebe, A.J., Solomatine, D.P., Venneker, R.G.W.: Application of Adaptive Fuzzy Rule-Based Models for Reconstruction of Missing Precipitation Events. Hydrological Sciences Journal 45(3), 425–436 (2000)
Sinharay, S., Stern, H.S., Russell, D.: The Use of Multiple Imputations for the Analysis of Missing Data. Psychological Methods 4, 317–329 (2001)
Khalil, K., Panu, M., Lennox, W.C.: Groups and Neural Networks Based Stream Flow Data Infilling Procedures. Journal of Hydrology 241, 153–176 (2001)
Bhattacharya, B., Shrestha, D.L., Solomatine, D.P.: Neural Networks in Reconstructing Missing Wave Data in Sedimentation Modeling. In: Proceedings of 30th IAHR Congress, Thessaloniki, Greece Congress, pp. 24–29 (2003)
Fessant, F., Midenet, S.: Self-organizing Map for Data Imputation and Correction in Surveys. Neural Computation Applications 10, 300–310 (2002)
Musil, C.M., Warner, C.B., Yobas, P.K., Jones, S.L.: A Comparison of Imputation Techniques for Handling Missing Data. Weston Journal of Nursing Research 24(7), 815–829 (2002)
Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for Imputation of Missing Values in Air Quality Data Sets. Atoms, Environment 38, 2895–2907 (2004)
Subasi, M., Subasi, E., Hammer, P.L.: New Imputation Method for Incomplete Binary Data. Rutcor Research Report (August 2009)
Kalteh, A.M., Hjorth, P.: Imputation of Missing values in Precipitation-Runoff Process Database. Journal of Hydrology Research 40(4), 420–432 (2009)
Papagelis, A., Kalles, D.: GAtree: Genetically Evolved Decision Trees. In: Proceedings of the 12th International Conference on Tools with Artificial Intelligence, vol. 13-15, pp. 203–206 (2000)
Rajasekaran, G.A., Pai, V.: Neural Networks Fuzzy Logic and Genetic Algorithms Synthesis and Applications. Prentice-Hall of India (2004)
Quinlan, J.R.: C4.5.: Programs for Machine Learning. Morgan Kaufman, San Mateo (1993)
Ruggieri, S.: Efficient C4.5. IEEE Transaction on Knowledge and Data Engineering 14(2) (March/April 2002)
Quinlan, J.R.: Decision Trees and Decision making. IEEE Transaction on Systems, Man, and Cybernetics 20(2) (March/April 1990)
Fu, Z., Mae, F.: A Computational Study of Using Genetic Algorithms to Develop Intelligent Decision Trees. In: Proceedings of the IEEE Congress on Evolutionary Computation (2001)
Oates, T., Jensen, D.: The Effect of Training Set Size on Decision Tree Complexity. In: Proceedings of the 14th International Conference on Machine Learning, pp. 254–262 (1997)
Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2010), http://archive.ics.uci.edu/ml
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Patil, D.V., Bichkar, R.S. (2012). Improving Generalization Ability of Classifier with Multiple Imputation Techniques. In: Venugopal, K.R., Patnaik, L.M. (eds) Wireless Networks and Computational Intelligence. ICIP 2012. Communications in Computer and Information Science, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31686-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-31686-9_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31685-2
Online ISBN: 978-3-642-31686-9
eBook Packages: Computer ScienceComputer Science (R0)