Cross Tabulation and Categorical Data Analysis

  • Amir Momeni
  • Matthew Pincus
  • Jenny Libien


Often, we have questions about associations of events or variables with each other or their correlation with each other. For example, in pathology we commonly face the question of association of a test result with a disease status. In statistics, the process of testing the association between events is called hypothesis testing. If the variables are categorical (i.e., they can only assume finite discrete values), a common approach to hypothesis testing is to employ cross tabulation.

Cross tabulation is the summarization of categorical data into a table with each cell in the table containing the frequency (either raw or proportional) of the observations that fit the categories represented by that cell. The summary data presented in cross-tabulated form then can be used for many statistical tests most of which follow a distribution called chi-squared distribution.

In this chapter, we explain the concept of hypothesis testing and introduce the most common statistical tests used in hypothesis testing of categorical data.


Categorical data Hypothesis testing Cross tabulation Chi-squared distribution Chi-squared tests Fisher’s exact test Agreement measures 


  1. 1.
    Strike PW. Statistical methods in laboratory medicine. New York: Butterworth-Heinemann; 2014.Google Scholar
  2. 2.
    Elliott AC, Woodward WA. Statistical analysis quick reference guidebook: with SPSS examples. Thousand Oaks: Sage; 2007.CrossRefGoogle Scholar
  3. 3.
    Agresti A, Kateri M. Categorical data analysis. Berlin Heidelberg: Springer; 2011.CrossRefGoogle Scholar
  4. 4.
    Fisher RA. On the interpretation of X2 from contingency tables, and the calculation of P. J R Stat Soc. 1922;85(1):87–94.CrossRefGoogle Scholar
  5. 5.
    Simpson EH. The interpretation of interaction in contingency tables. J R Stat Soc Ser B Methodol. 1951;13:238–41.Google Scholar
  6. 6.
    Wilson EB, Hilferty MM. The distribution of chi-square. Proc Natl Acad Sci. 1931;17(12):684–8.CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Eisenhauer JG. Degrees of freedom. Teach Stat. 2008;30(3):75–8.CrossRefGoogle Scholar
  8. 8.
    Sharpe D. Your chi-square test is statistically significant: Now what? Practical Assessment, Research & Evaluation. 2015;20:1–10.Google Scholar
  9. 9.
    Scheaffer RL, Yes N. Categorical data analysis: NCSSM Statistics Leadership Institute, USA; 1999. (online publication accessible at:
  10. 10.
    Fleiss JL. Categorical Data Analysis. J Am Stat Assoc. 1991;86(416):1140–1.CrossRefGoogle Scholar
  11. 11.
    Mantel N. Chi-square tests with one degree of freedom; extensions of the Mantel-Haenszel procedure. J Am Stat Assoc. 1963;58(303):690–700.Google Scholar
  12. 12.
    Trajman A, Luiz RR. McNemar X2 test revisited: comparing sensitivity and specificity of diagnostic examinations. Scand J Clin Lab Invest. 2008;68(1):77–80.CrossRefPubMedGoogle Scholar
  13. 13.
    Routledge R. Fisher’s exact test. In: Encyclopedia of biostatistics. New York: John Wiley Publishing; 2005.Google Scholar
  14. 14.
    Cohen J. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull. 1968;70(4):213.CrossRefPubMedGoogle Scholar
  15. 15.
    Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973;33(3):613–9.CrossRefGoogle Scholar
  16. 16.
    Zhou XH, McClish DK, Obuchowski NA. Statistical methods in diagnostic medicine. John Wiley & Sons: New York; 2009.Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Amir Momeni
    • 1
  • Matthew Pincus
    • 1
  • Jenny Libien
    • 1
  1. 1.Department of PathologyState University of New York, Downstate Medical CenterBrooklynUSA

Personalised recommendations