Classification Models

Wendler, Tilo; Gröttrup, Sören

doi:10.1007/978-3-319-28709-6_8

Classification Models

Tilo Wendler³ &
Sören Gröttrup⁴

Chapter
First Online: 07 June 2016

9435 Accesses

Abstract

One of the main problems often occurs in data analytics is assigning a category to each data record. These kinds of problems are very common in all kinds of areas and fields, such as Economics, Medicine, and Computer Science. For example, one classical use case in the online marketing sector is to decide if a customer should get a certain e-mail promotion, as he or she is likely to respond to it. Classification models are the mathematical tool to face these problems. In this chapter, we introduce the most famous classification methods, which are provided by the IBM SPSS Modeler. We explain how these classifiers are trained and validated with the IBM SPSS Modeler and describe their usage and interpretation on data examples.

After finishing this chapter, the reader …

1.
is familiar with the most challenges when dealing with a classification problem and knows how to handles them.
2.
possesses a large toolbox of different classification methods and knows their advantages and disadvantages.
3.
is able to build various classification models with the SPSS Modeler and is able to apply it to new data for prediction.

knows various validation methods and criteria and can evaluate the quality of the trained classification models within the SPSS Modeler stream.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Literature

Allison, P. D. (2014). Measures of fit for logistic regression. Accessed 19/09/2015, from http://support.sas.com/resources/papers/proceedings14/1485-2014.pdf
Google Scholar
Azzalini, A., & Scarpa, B. (2012). Data analysis and data mining: An introduction. Oxford: Oxford University Press.
MATH Google Scholar
Ben-Gal, I. (2008). Bayesian Networks. In F. Ruggeri, R. S. Kenett, & F. W. Faltin (Eds.), Encyclopedia of statistics in quality and reliability. Chichester, UK: Wiley.
Google Scholar
Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful? In G. Goos, J. Hartmanis, J. van Leeuwen, C. Beeri, & P. Buneman (Eds.), Database Theory—ICDT’99, Lecture notes in computer science (Vol. 1540, pp. 217–235). Berlin: Springer.
Google Scholar
Biggs, D., de Ville, B., & Suen, E. (1991). A method of choosing multiway partitions for classification and decision trees. Journal of Applied Statistics, 18(1), 49–62.
Article Google Scholar
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Boca Raton, FL: CRC Press.
MATH Google Scholar
Cheng, B., & Titterington, D. M. (1994). Neural Networks: A review from a statistical perspective. Statistical Science, 9(1), 2–30.
Article MATH MathSciNet Google Scholar
Cormen, T. H. (2009). Introduction to algorithms. Cambridge: MIT Press.
MATH Google Scholar
Esposito, F., Malerba, D., Semeraro, G., & Kay, J. (1997). A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 476–493.
Article Google Scholar
Fahrmeir, L. (2013). Regression: Models, methods and applications. Berlin: Springer.
Book MATH Google Scholar
Fisher, R. A. (1936). The use of multiple measurement in taxonomic problems. Annals of Eugenics, 7(2), 179–188.
Article Google Scholar
He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Article Google Scholar
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques, The Morgan Kaufmann series in data management systems (3rd ed.). Waltham, MA: Morgan Kaufmann.
Google Scholar
IBM. (2015a). SPSS Modeler 17 Algorithms Guide. Accessed 18/09/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/AlgorithmsGuide.pdf
IBM. (2015b). SPSS Modeler 17 Modeling Nodes. Accessed 18/09/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/ModelerModelingNodes.pdf
IBM. (2015c). SPSS Modeler 17 Source, Process, and Output Nodes. Accessed 19/03/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/ModelerSPOnodes.pdf
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 103). New York: Springer.
MATH Google Scholar
Kanji, G. K. (2009). 100 statistical tests (3rd ed.). London: Sage (reprinted).
Google Scholar
Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119.
Article Google Scholar
Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer.
Book MATH Google Scholar
Lantz, B. (2013). Machine learning with R: Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications, Open source. Community experience distilled.
Google Scholar
Loh, W.-Y., & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica, 7(4), 815–840.
MATH MathSciNet Google Scholar
Machine Learning Repository. (1998). Optical recognition of handwritten digits. Accessed 2015, from https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits
Niedermeyer, E., Schomer, D. L., & Lopes da Silva, F. H. (2011). Niedermeyer’s electroencephalography: Basic principles, clinical applications, and related fields (6th ed.). Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins Health.
Google Scholar
Oh, S.-H., Lee, Y.-R., & Kim, H.-N. (2014). A novel EEG feature extraction method using Hjorth parameter. International Journal of Electronics and Electrical Engineering, 2(2), 106–110.
Article Google Scholar
Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4, 1883.
Article Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning, The Morgan Kaufmann series in machine learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
R Core Team. (2014). R: A Language and Environment for Statistical Computing. http://www.R-project.org/
Rivest, R. (1987). Learning decision lists. Machine Learning, 2(3), 229–246.
Google Scholar
RStudio Team. (2015). RStudio: Integrated Development Environment for R. http://www.rstudio.com/
Runkler, T. A. (2012). Data analytics: Models and algorithms for intelligent data analysis. Wiesbaden: Springer Vieweg.
Book Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond, Adaptive computation and machine learning. Cambridge, MA: MIT Press.
Google Scholar
Tuffery, S. (2011). Data mining and statistics for decision making, Wiley series in computational statistics. Chichester: Wiley.
Book MATH Google Scholar
Welch, B. L. (1939). Note on discriminant functions. Biometrika, 31, 218–220.
MATH MathSciNet Google Scholar
Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87(23), 9193–9196.
Article MATH Google Scholar
Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., & Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.
Article Google Scholar
Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms (Chapman & Hall/CRC machine learning & pattern recognition series). Boca Raton, FL: Taylor & Francis.
Google Scholar

Download references

Author information

Authors and Affiliations

HTW Berlin, University of Applied Sciences, Berlin, Germany
Tilo Wendler
Stuttgart, Germany
Sören Gröttrup

Authors

Tilo Wendler
View author publications
You can also search for this author in PubMed Google Scholar
Sören Gröttrup
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wendler, T., Gröttrup, S. (2016). Classification Models. In: Data Mining with SPSS Modeler. Springer, Cham. https://doi.org/10.1007/978-3-319-28709-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-28709-6_8
Published: 07 June 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28707-2
Online ISBN: 978-3-319-28709-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics