Abstract
This article investigated the impact of categorical input encoding and scaling approaches on neural network sensitivity and overall classification performance in the context of predicting the repeat viewing propensity of movie goers. The results show that neural network out of sample minimum sensitivity and overall classification performance are indifferent to the scaling of the categorical inputs. However, the encoding of inputs had a significant impact on classification accuracy and utilising ordinal or thermometer encoding approaches for categorical inputs significantly increases the out of sample accuracy of the neural network classifier. These findings confirm that the impact of categorical encoding is problem specific for an ordinal approach, and support thermometer encoding as most suitable for categorical inputs. The classification performance of neural networks was compared against a logistic regression model and the results show that in this instance, the non-parametric approach does not offer any advantage over standard statistical models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Paliwal, M., Kumar, U.A.: Neural networks and statistical techniques: A review of applications. ESWA 36(1), 2–17 (2009)
Brouwer, R.: A feed-forward network for input that is both categorical and quantitative. NN (2002)
Crone, S., Lessmann, S., Stahlbock, R.: The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. EJOR 9(16), 781–800 (2006)
Niu, D., Wang, Y., Wu, D.D.: Power load forecasting using support vector machine and ant colony optimization. ESWA 37(3), 2531–2539 (2010)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
Kim, K., Han, I.: Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. ESWA 19(2), 125–132 (2000)
Collins, A., Hand, C., Linnell, M.: Analyzing repeat consumption of identical cultural goods: some exploratory evidence from moviegoing. J. Cult. Econ. 32(3), 187–199 (2008)
Sharda, R., Delen, D.: Predicting box-office success of motion pictures with neural networks. ESWA 30(2), 243–254 (2006)
Zhang, L., Luo, J., Yang, S.: Forecasting box office revenue of movies with BP neural network. ESWA 36(2), 6580–6587 (2009)
Kim, S.: Prediction of hotel bankruptcy using support vector machine, artificial neural network, logistic regression, and multivariate discriminant analysis. Serv. Ind. J. 31(3), 441–468 (2011)
Mazzatorta, P., Benfenati, E., Neagu, D., Gini, G.: The importance of scaling in data mining for toxicity prediction. JCICS 42(5), 1250–1255 (2002)
Viaene, S., Dedene, G., Derrig, R.: Auto claim fraud detection using Bayesian learning neural networks. ESWA 29(3), 653–666 (2005)
Sahoo, G., Ray, C., Mehnert, E., Keefer, D.: Application of artificial neural networks to assess pesticide contamination in shallow groundwater. SCTEN 367(1), 234–251 (2006)
Setiono, R., Thong, J., Yap, C.: Symbolic rule extraction from neural networks: An application to identifying organizations adopting IT. Inform. & Manage. 34(2), 91–101 (1998)
Hsu, C.: Generalizing self-organizing map for categorical data. NN (2006)
Sakai, S., Kobayashi, K., Toyabe, S.I., Mandai, N., Kanda, T., Akazawa, T.: Comparison of the Levels of Accuracy of an Artificial Neural Network Model and a Logistic Regression Model for the Diagnosis of Acute Appendicitis. J. Med. Syst. 31(5), 357–364 (2007)
Lai, K.K., Yu, L., Wang, S., Zhou, L.: Neural Network Metalearning for Credit Scoring. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006, Part I. LNCS, vol. 4113, pp. 403–408. Springer, Heidelberg (2006)
Basheer, I., Hajmeer, M.: Artificial neural networks: fundamentals, computing, design, and application. J. Microbiol. Methods 43(1), 3–31 (2000)
Kaastra, I., Boyd, M.: Designing a neural network for forecasting financial and economic time series. NC 10(3), 215–236 (1996)
Carter, R.J., Dubchak, I., Holbrook, S.R.: A computational approach to identify genes for functional RNAs in genomic sequences. NAR 29(19), 3928–3938 (2001)
Haykin, S.: Neural Netwoks and Learning Machines, 3rd edn. Pearson Intenational Edition (2009)
Fernández-Navarro, F., Hervás-MartÃnez, C., GarcÃa-Alonso, C., Torres-Jimenez, M.: Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity. ESWA 38(10), 12483–12490 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fitkov-Norris, E., Vahid, S., Hand, C. (2012). Evaluating the Impact of Categorical Data Encoding and Scaling on Neural Network Classification Performance: The Case of Repeat Consumption of Identical Cultural Goods. In: Jayne, C., Yue, S., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2012. Communications in Computer and Information Science, vol 311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32909-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-32909-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32908-1
Online ISBN: 978-3-642-32909-8
eBook Packages: Computer ScienceComputer Science (R0)