Skip to main content

Classification Models

  • Chapter
  • First Online:
  • 9435 Accesses

Abstract

One of the main problems often occurs in data analytics is assigning a category to each data record. These kinds of problems are very common in all kinds of areas and fields, such as Economics, Medicine, and Computer Science. For example, one classical use case in the online marketing sector is to decide if a customer should get a certain e-mail promotion, as he or she is likely to respond to it. Classification models are the mathematical tool to face these problems. In this chapter, we introduce the most famous classification methods, which are provided by the IBM SPSS Modeler. We explain how these classifiers are trained and validated with the IBM SPSS Modeler and describe their usage and interpretation on data examples.

After finishing this chapter, the reader …

  1. 1.

    is familiar with the most challenges when dealing with a classification problem and knows how to handles them.

  2. 2.

    possesses a large toolbox of different classification methods and knows their advantages and disadvantages.

  3. 3.

    is able to build various classification models with the SPSS Modeler and is able to apply it to new data for prediction.

knows various validation methods and criteria and can evaluate the quality of the trained classification models within the SPSS Modeler stream.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Literature

  • Allison, P. D. (2014). Measures of fit for logistic regression. Accessed 19/09/2015, from http://support.sas.com/resources/papers/proceedings14/1485-2014.pdf

    Google Scholar 

  • Azzalini, A., & Scarpa, B. (2012). Data analysis and data mining: An introduction. Oxford: Oxford University Press.

    MATH  Google Scholar 

  • Ben-Gal, I. (2008). Bayesian Networks. In F. Ruggeri, R. S. Kenett, & F. W. Faltin (Eds.), Encyclopedia of statistics in quality and reliability. Chichester, UK: Wiley.

    Google Scholar 

  • Beyer, K., Goldstein, J., Ramakrishnan, R., & Shaft, U. (1999). When is “Nearest Neighbor” meaningful? In G. Goos, J. Hartmanis, J. van Leeuwen, C. Beeri, & P. Buneman (Eds.), Database Theory—ICDT’99, Lecture notes in computer science (Vol. 1540, pp. 217–235). Berlin: Springer.

    Google Scholar 

  • Biggs, D., de Ville, B., & Suen, E. (1991). A method of choosing multiway partitions for classification and decision trees. Journal of Applied Statistics, 18(1), 49–62.

    Article  Google Scholar 

  • Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Boca Raton, FL: CRC Press.

    MATH  Google Scholar 

  • Cheng, B., & Titterington, D. M. (1994). Neural Networks: A review from a statistical perspective. Statistical Science, 9(1), 2–30.

    Article  MATH  MathSciNet  Google Scholar 

  • Cormen, T. H. (2009). Introduction to algorithms. Cambridge: MIT Press.

    MATH  Google Scholar 

  • Esposito, F., Malerba, D., Semeraro, G., & Kay, J. (1997). A comparative analysis of methods for pruning decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 476–493.

    Article  Google Scholar 

  • Fahrmeir, L. (2013). Regression: Models, methods and applications. Berlin: Springer.

    Book  MATH  Google Scholar 

  • Fisher, R. A. (1936). The use of multiple measurement in taxonomic problems. Annals of Eugenics, 7(2), 179–188.

    Article  Google Scholar 

  • He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.

    Article  Google Scholar 

  • Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques, The Morgan Kaufmann series in data management systems (3rd ed.). Waltham, MA: Morgan Kaufmann.

    Google Scholar 

  • IBM. (2015a). SPSS Modeler 17 Algorithms Guide. Accessed 18/09/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/AlgorithmsGuide.pdf

  • IBM. (2015b). SPSS Modeler 17 Modeling Nodes. Accessed 18/09/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/ModelerModelingNodes.pdf

  • IBM. (2015c). SPSS Modeler 17 Source, Process, and Output Nodes. Accessed 19/03/2015, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/modeler/17.0/en/ModelerSPOnodes.pdf

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 103). New York: Springer.

    MATH  Google Scholar 

  • Kanji, G. K. (2009). 100 statistical tests (3rd ed.). London: Sage (reprinted).

    Google Scholar 

  • Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2), 119.

    Article  Google Scholar 

  • Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. New York: Springer.

    Book  MATH  Google Scholar 

  • Lantz, B. (2013). Machine learning with R: Learn how to use R to apply powerful machine learning methods and gain an insight into real-world applications, Open source. Community experience distilled.

    Google Scholar 

  • Loh, W.-Y., & Shih, Y.-S. (1997). Split selection methods for classification trees. Statistica Sinica, 7(4), 815–840.

    MATH  MathSciNet  Google Scholar 

  • Machine Learning Repository. (1998). Optical recognition of handwritten digits. Accessed 2015, from https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

  • Niedermeyer, E., Schomer, D. L., & Lopes da Silva, F. H. (2011). Niedermeyer’s electroencephalography: Basic principles, clinical applications, and related fields (6th ed.). Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins Health.

    Google Scholar 

  • Oh, S.-H., Lee, Y.-R., & Kim, H.-N. (2014). A novel EEG feature extraction method using Hjorth parameter. International Journal of Electronics and Electrical Engineering, 2(2), 106–110.

    Article  Google Scholar 

  • Peterson, L. E. (2009). K-nearest neighbor. Scholarpedia, 4, 1883.

    Article  Google Scholar 

  • Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for machine learning, The Morgan Kaufmann series in machine learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • R Core Team. (2014). R: A Language and Environment for Statistical Computing. http://www.R-project.org/

  • Rivest, R. (1987). Learning decision lists. Machine Learning, 2(3), 229–246.

    Google Scholar 

  • RStudio Team. (2015). RStudio: Integrated Development Environment for R. http://www.rstudio.com/

  • Runkler, T. A. (2012). Data analytics: Models and algorithms for intelligent data analysis. Wiesbaden: Springer Vieweg.

    Book  Google Scholar 

  • Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond, Adaptive computation and machine learning. Cambridge, MA: MIT Press.

    Google Scholar 

  • Tuffery, S. (2011). Data mining and statistics for decision making, Wiley series in computational statistics. Chichester: Wiley.

    Book  MATH  Google Scholar 

  • Welch, B. L. (1939). Note on discriminant functions. Biometrika, 31, 218–220.

    MATH  MathSciNet  Google Scholar 

  • Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 87(23), 9193–9196.

    Article  MATH  Google Scholar 

  • Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G. J., Ng, A., Liu, B., Yu, P. S., Zhou, Z.-H., Steinbach, M., Hand, D. J., & Steinberg, D. (2008). Top 10 algorithms in data mining. Knowledge and Information Systems, 14(1), 1–37.

    Article  Google Scholar 

  • Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms (Chapman & Hall/CRC machine learning & pattern recognition series). Boca Raton, FL: Taylor & Francis.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Wendler, T., Gröttrup, S. (2016). Classification Models. In: Data Mining with SPSS Modeler. Springer, Cham. https://doi.org/10.1007/978-3-319-28709-6_8

Download citation

Publish with us

Policies and ethics