Abstract
One of the main problems often occurs in data analytics is assigning a category to each data record. These kinds of problems are very common in all kinds of areas and fields, such as Economics, Medicine, and Computer Science. For example, one classical use case in the online marketing sector is to decide if a customer should get a certain e-mail promotion, as he or she is likely to respond to it. Classification models are the mathematical tool to face these problems. In this chapter, we introduce the most famous classification methods, which are provided by the IBM SPSS Modeler. We explain how these classifiers are trained and validated with the IBM SPSS Modeler and describe their usage and interpretation on data examples.
After finishing this chapter, the reader …
-
1.
is familiar with the most challenges when dealing with a classification problem and knows how to handles them.
-
2.
possesses a large toolbox of different classification methods and knows their advantages and disadvantages.
-
3.
is able to build various classification models with the SPSS Modeler and is able to apply it to new data for prediction.
knows various validation methods and criteria and can evaluate the quality of the trained classification models within the SPSS Modeler stream.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsLiterature
Allison, P. D. (2014). Measures of fit for logistic regression. Accessed 19/09/2015, from http://support.sas.com/resources/papers/proceedings14/1485-2014.pdf
Azzalini, A., & Scarpa, B. (2012). Data analysis and data mining: An introduction. Oxford: Oxford University Press.
Ben-Gal, I. (2008). Bayesian Networks. In F. Ruggeri, R. S. Kenett, & F. W. Faltin (Eds.), Encyclopedia of statistics in quality and reliability. Chichester, UK: Wiley.
Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful? In G. Goos, J. Hartmanis, J. van Leeuwen, C. Beeri, & P. Buneman (Eds.), Database Theory—ICDT’99, Lecture notes in computer science (Vol. 1540, pp. 217–235). Berlin: Springer.
Biggs, D., de Ville, B., & Suen, E. (1991). A method of choosing multiway partitions for classification and decision trees. Journal of Applied Statistics, 18(1), 49–62.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Boca Raton, FL: CRC Press.
Cheng, B., & Titterington, D. M. (1994). Neural Networks: A review from a statistical perspective. Statistical Science, 9(1), 2–30.
Cormen, T. H. (2009). Introduction to algorithms. Cambridge: MIT Press.
Esposito, F., Malerba, D., Semeraro, G., & Kay, J. (1997). A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 476–493.
Fahrmeir, L. (2013). Regression: Models, methods and applications. Berlin: Springer.
Fisher, R. A. (1936). The use of multiple measurement in taxonomic problems. Annals of Eugenics, 7(2), 179–188.
He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques, The Morgan Kaufmann series in data management systems (3rd ed.). Waltham, MA: Morgan Kaufmann.
IBM. (2015a). SPSS Modeler 17 Algorithms Guide. Accessed 18/09/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/AlgorithmsGuide.pdf
IBM. (2015b). SPSS Modeler 17 Modeling Nodes. Accessed 18/09/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/ModelerModelingNodes.pdf
IBM. (2015c). SPSS Modeler 17 Source, Process, and Output Nodes. Accessed 19/03/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/ModelerSPOnodes.pdf
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 103). New York: Springer.
Kanji, G. K. (2009). 100 statistical tests (3rd ed.). London: Sage (reprinted).
Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119.
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer.
Lantz, B. (2013). Machine learning with R: Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications, Open source. Community experience distilled.
Loh, W.-Y., & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica, 7(4), 815–840.
Machine Learning Repository. (1998). Optical recognition of handwritten digits. Accessed 2015, from https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
Niedermeyer, E., Schomer, D. L., & Lopes da Silva, F. H. (2011). Niedermeyer’s electroencephalography: Basic principles, clinical applications, and related fields (6th ed.). Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins Health.
Oh, S.-H., Lee, Y.-R., & Kim, H.-N. (2014). A novel EEG feature extraction method using Hjorth parameter. International Journal of Electronics and Electrical Engineering, 2(2), 106–110.
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4, 1883.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Quinlan, J. R. (1993). C4.5: Programs for machine learning, The Morgan Kaufmann series in machine learning. San Mateo, CA: Morgan Kaufmann.
R Core Team. (2014). R: A Language and Environment for Statistical Computing. http://www.R-project.org/
Rivest, R. (1987). Learning decision lists. Machine Learning, 2(3), 229–246.
RStudio Team. (2015). RStudio: Integrated Development Environment for R. http://www.rstudio.com/
Runkler, T. A. (2012). Data analytics: Models and algorithms for intelligent data analysis. Wiesbaden: Springer Vieweg.
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond, Adaptive computation and machine learning. Cambridge, MA: MIT Press.
Tuffery, S. (2011). Data mining and statistics for decision making, Wiley series in computational statistics. Chichester: Wiley.
Welch, B. L. (1939). Note on discriminant functions. Biometrika, 31, 218–220.
Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87(23), 9193–9196.
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., & Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.
Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms (Chapman & Hall/CRC machine learning & pattern recognition series). Boca Raton, FL: Taylor & Francis.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Wendler, T., Gröttrup, S. (2016). Classification Models. In: Data Mining with SPSS Modeler. Springer, Cham. https://doi.org/10.1007/978-3-319-28709-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-28709-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28707-2
Online ISBN: 978-3-319-28709-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)