Abstract
The statistical analysis of associations is a central theme in this book. This chapter starts with a description of the properties of the odds ratio, including its maximum likelihood estimation. Because of its variation independence from the marginal distributions, it is argued the odds ratio is the most useful measure of association. The structure of I × J tables, as described by the systems of local or spanning cell odds ratios, which are generalizations of the simple odds ratio defined for 2 × 2 tables, is described and analyzed by association models. The odds ratio is generalized to higher-dimensional tables by introducing a hierarchical structure of conditional odds ratios. Independence may be seen as lack of association, but a related simple structure, conditional independence, is found more often in real data, and properties of the maximum likelihood estimates under conditional independence are studied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
That is, each of their elements is equal to 1.
- 2.
Remember, statistics is about inference from sample to population and one is assumed to have the population value in this case.
- 3.
Some textbooks go as far as suggesting classifications as to what is a weak or a strong effect, for example, in terms of odds ratios (or the correlation coefficient in a different context). Such suggestions, without taking into account the circumstances of data collection and the actual research question or policy implications, go beyond the scope of statistical analysis. Unfortunately, some substantive scientists are happy to rely on such suggestions.
- 4.
That is, whether or not the observed value warrants us to think that the population value is not 1.
- 5.
The odds ratio is not a one-to-one function of the cell probabilities, and thus Proposition 4.2 cannot be used directly.
- 6.
The \(\sqrt{n}\) is removed from the formula, if the asymptotic standard error of \(\sqrt{n}\) times the estimate of the odds ratio is considered.
- 7.
The subsequent concepts of parameter and parameterization are also used for frequency distributions.
- 8.
More precisely: increases without and upper bound.
- 9.
This ratio was used historically in social mobility research, see, e.g., [88].
- 10.
The notations p(i, j) and p ij mean the same and are used interchangeably for better readability of the text.
- 11.
Positivity of the probabilities is assumed here.
- 12.
The same development is possible for cell frequencies, too.
- 13.
In practice, both errors and relative errors are often expressed as percentages. More precisely, relative errors, including the relative standard error, are expressed in percent. For example, in the latter case, the lower bound for the largest relative standard error is about 6 percent. On the other hand, errors in absolute terms, including standard errors, should be expressed in percentage points. For example, with a sample size of n = 2500 and simple random sampling, the standard error of the estimate of a probability is not more than 1 percentage point. Unfortunately, both percent and percentage point are denoted as %.
- 14.
In Sect. 10.2, generalizations of independence for more variables will be discussed that play a role very similar to independence in terms of simplifying structures. Further, several of the models discussed later in the book formulate simplifying properties which may be considered as generalizations of independence.
- 15.
There are many uses of statistical methods, where researchers are interested in establishing the existence of an effect unknown thus far, instead of establishing that certain, theoretically possible effects do not exist. This may be relevant from a substantive point of view, but statistics prefers simple descriptions of reality over complex ones. Apart from the technical reasons referred to in the previous paragraph, the main reason is that assuming all variables are related – with no association considered as a special case of association – is certainly a correct starting point. If certain effects do not exist, one has a simpler yet true description of reality. On the other hand, assuming no relationships among the variables considered is certainly false. Realizing the existence of a relationship is very unlikely to turn this incorrect assumption into a correct one.
- 16.
The argument here is about structure, so questions related to statistical significance are avoided by assuming that the data contain the entire population of interest.
- 17.
The value of the conditional odds ratio remains the same, if computed from frequencies or from probabilities or from conditional probabilities.
- 18.
The name graphical refers to the fact that when the variables of interest are supposed to have a joint normal distribution, many of these models may be represented by a graph. When the variables are categorical, the situation is more complex. See Sect. 1.3 for some related material and a brief discussion in Chap. 13
- 19.
This property is directly generalized by MLEs under log-linear models to be discussed later in the book.
- 20.
The order of the cells used in this book is called lexicographic order.
References
Bartolucci, F., Forcina, A.: Extended RC Association Models Allowing for Order Restrictions and Marginal Modeling. Journal of the American Statistical Association, 97, 1192–1199 (2002)
Becker, M.P., Clogg, C.C.: An alysis of Sets of Two-way Contingency Tables Using Association Models. Journal of the American Statistical Association, 84, 142–151 (1989)
Clogg, C.C., Shihadeh, E.S.: Statistical Models for Ordinal Variables. Sage Publications, Thousand Oaks (1994)
Goodman, L. A.: Simple Models for the Analysis of Association in Cross- Classifications Having Ordered Categories. Journal of the American Statistical Association 74, 37–52 (1979)
Rudas, T.: Odds Ratios in the Analysis of Contingency Tables. Sage Publications, Thousand Oaks (1998)
Tyree, A.: Mobility ratios and association in mobility tables. Population Studies, 27, 577–588 (1973)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this chapter
Cite this chapter
Rudas, T. (2018). Association. In: Lectures on Categorical Data Analysis. Springer Texts in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-7693-5_6
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7693-5_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-7691-1
Online ISBN: 978-1-4939-7693-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)