# The Role of Categorical Data Analysis

• Tamás Rudas
Chapter
Part of the Springer Texts in Statistics book series (STS)

## Abstract

Any real data collection procedure may lead only to finitely many different observations (categories or measured values), not only in practice but also in theory. The various relationships that are possible between the observed categories are defined in the theory of levels of measurement. The assumption of continuous data, prevalent in several fields of applications of statistics, is an abstraction that may simplify the analysis but does not come without a price. The most common simplifying assumption is that the data have a (multivariate) normal distribution or their distribution belongs to some other parametric family. Another type of assumption, made in nonparametric statistics, is that of a continuous distribution function and essentially implies that all the observations are different. These assumptions have various motivations behind them, from substantive knowledge to mathematical convenience, but often also the lack of existence of appropriate methods to handle the data in categorical form or the lack of knowledge of these methods. In many scientific fields where data are being collected and analyzed, most notably in the social and behavioral sciences but often also in economics, medicine, biology, and quality control, the observations do not have the characteristics possessed by numbers, and assuming they come from a continuous distributions is entirely ungrounded. Further, several important questions in statistics, including joint effects of explanatory variables on a response variable, may be better studied when the variables involved are categorical, than when they are assumed to be continuous. For example, when three variables have a trivariate normal distribution, then the joint effect of two of them on the third one cannot be different from what could be predicted from a pairwise analysis. But in reality, if multivariate normality does not hold, the joint effect is a characteristic of the joint distribution of the three variables. In such a case, the assumption of normality makes it impossible to realize the true nature of the joint effect. For categorical data, structure and stochastics in statistical modeling are largely independent and are studied separately.

## References

1. 16.
Cohn, D.L.: Measure Theory. Birkhauser, Boston (1980)
2. 32.
Greenacre, M.J.: Theory and Applications of Correspondence Analysis. Academic Press, London (1984)
3. 52.
Lehmann, E.L., D’Abera, H.J.M.: Nonparametrics: Statistical Methods Based on Ranks. Springer, New York (2006)Google Scholar
4. 72.
Rudas, T.: Odds Ratios in the Analysis of Contingency Tables. Sage Publications, Thousand Oaks (1998)
5. 85.
Shackel, N.: Paradoxes in Probability Theory. In Rudas, T. (ed.) Handbook of Probability: Theory and Applications, pp. 49–66. Sage Publications, Thousand Oaks (2008)
6. 89.
Uebersax, J.: The tetrachoric correlation. http://www.john-uebersax.com/stat/tetra.htm 13 July 2012.
7. 90.
Vargha, A., Rudas, T. Delaney, H.D., Maxwell, S.E.: Dichotomization, partial correlation, and conditional independence. Journal of Educational and Behavioral Statistics 21, 264–282 (1996)Google Scholar