Skip to main content

Association

  • Chapter
  • First Online:
Lectures on Categorical Data Analysis

Part of the book series: Springer Texts in Statistics ((STS))

  • 3067 Accesses

Abstract

The statistical analysis of associations is a central theme in this book. This chapter starts with a description of the properties of the odds ratio, including its maximum likelihood estimation. Because of its variation independence from the marginal distributions, it is argued the odds ratio is the most useful measure of association. The structure of I × J tables, as described by the systems of local or spanning cell odds ratios, which are generalizations of the simple odds ratio defined for 2 × 2 tables, is described and analyzed by association models. The odds ratio is generalized to higher-dimensional tables by introducing a hierarchical structure of conditional odds ratios. Independence may be seen as lack of association, but a related simple structure, conditional independence, is found more often in real data, and properties of the maximum likelihood estimates under conditional independence are studied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ​That is, each of their elements is equal to 1.

  2. 2.

    ​Remember, statistics is about inference from sample to population and one is assumed to have the population value in this case.

  3. 3.

    ​Some textbooks go as far as suggesting classifications as to what is a weak or a strong effect, for example, in terms of odds ratios (or the correlation coefficient in a different context). Such suggestions, without taking into account the circumstances of data collection and the actual research question or policy implications, go beyond the scope of statistical analysis. Unfortunately, some substantive scientists are happy to rely on such suggestions.

  4. 4.

    ​That is, whether or not the observed value warrants us to think that the population value is not 1.

  5. 5.

    ​The odds ratio is not a one-to-one function of the cell probabilities, and thus Proposition 4.2 cannot be used directly.

  6. 6.

    ​The \(\sqrt{n}\) is removed from the formula, if the asymptotic standard error of \(\sqrt{n}\) times the estimate of the odds ratio is considered.

  7. 7.

    ​The subsequent concepts of parameter and parameterization are also used for frequency distributions.

  8. 8.

    ​More precisely: increases without and upper bound.

  9. 9.

    ​This ratio was used historically in social mobility research, see, e.g., [88].

  10. 10.

    ​The notations p(i, j) and p ij mean the same and are used interchangeably for better readability of the text.

  11. 11.

    ​Positivity of the probabilities is assumed here.

  12. 12.

    ​The same development is possible for cell frequencies, too.

  13. 13.

    ​In practice, both errors and relative errors are often expressed as percentages. More precisely, relative errors, including the relative standard error, are expressed in percent. For example, in the latter case, the lower bound for the largest relative standard error is about 6 percent. On the other hand, errors in absolute terms, including standard errors, should be expressed in percentage points. For example, with a sample size of n = 2500 and simple random sampling, the standard error of the estimate of a probability is not more than 1 percentage point. Unfortunately, both percent and percentage point are denoted as %.

  14. 14.

    ​In Sect. 10.2, generalizations of independence for more variables will be discussed that play a role very similar to independence in terms of simplifying structures. Further, several of the models discussed later in the book formulate simplifying properties which may be considered as generalizations of independence.

  15. 15.

    ​There are many uses of statistical methods, where researchers are interested in establishing the existence of an effect unknown thus far, instead of establishing that certain, theoretically possible effects do not exist. This may be relevant from a substantive point of view, but statistics prefers simple descriptions of reality over complex ones. Apart from the technical reasons referred to in the previous paragraph, the main reason is that assuming all variables are related – with no association considered as a special case of association – is certainly a correct starting point. If certain effects do not exist, one has a simpler yet true description of reality. On the other hand, assuming no relationships among the variables considered is certainly false. Realizing the existence of a relationship is very unlikely to turn this incorrect assumption into a correct one.

  16. 16.

    ​The argument here is about structure, so questions related to statistical significance are avoided by assuming that the data contain the entire population of interest.

  17. 17.

    ​The value of the conditional odds ratio remains the same, if computed from frequencies or from probabilities or from conditional probabilities.

  18. 18.

    ​The name graphical refers to the fact that when the variables of interest are supposed to have a joint normal distribution, many of these models may be represented by a graph. When the variables are categorical, the situation is more complex. See Sect. 1.3 for some related material and a brief discussion in Chap. 13

  19. 19.

    ​This property is directly generalized by MLEs under log-linear models to be discussed later in the book.

  20. 20.

    ​The order of the cells used in this book is called lexicographic order.

References

  1. Bartolucci, F., Forcina, A.: Extended RC Association Models Allowing for Order Restrictions and Marginal Modeling. Journal of the American Statistical Association, 97, 1192–1199 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  2. Becker, M.P., Clogg, C.C.: An alysis of Sets of Two-way Contingency Tables Using Association Models. Journal of the American Statistical Association, 84, 142–151 (1989)

    Article  MathSciNet  Google Scholar 

  3. Clogg, C.C., Shihadeh, E.S.: Statistical Models for Ordinal Variables. Sage Publications, Thousand Oaks (1994)

    Google Scholar 

  4. Goodman, L. A.: Simple Models for the Analysis of Association in Cross- Classifications Having Ordered Categories. Journal of the American Statistical Association 74, 37–52 (1979)

    Google Scholar 

  5. Rudas, T.: Odds Ratios in the Analysis of Contingency Tables. Sage Publications, Thousand Oaks (1998)

    Book  Google Scholar 

  6. Tyree, A.: Mobility ratios and association in mobility tables. Population Studies, 27, 577–588 (1973)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rudas, T. (2018). Association. In: Lectures on Categorical Data Analysis. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-7693-5_6

Download citation

Publish with us

Policies and ethics