Skip to main content
Log in

The Non-uniqueness Property of the Intrinsic Estimator in APC Models

  • Published:
Demography

Abstract

This article explores an important property of the intrinsic estimator that has received no attention in literature: the age, period, and cohort estimates of the intrinsic estimator are not unique but vary with the parameterization and reference categories chosen for these variables. We give a formal proof of the non-uniqueness property for effect coding and dummy variable coding. Using data on female mortality in the United States over the years 1960–1999, we show that the variation in the results obtained for different parameterizations and reference categories is substantial and leads to contradictory conclusions. We conclude that the non-uniqueness property is a new argument for not routinely applying the intrinsic estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. For the APC models that we discuss in this article, having the same number of units in each combination of age and period is not required.

  2. We determined the linear trend in, for example, age by a simple ordinary least squares (OLS) regression on the 19 age estimates. We used the highest and lowest linear trends only to demonstrate the variability in IE estimates when different sets of categories are omitted.

  3. The Stata user added routine apc_ie provides deviation contrast IE estimates with only the last categories omitted. To obtain estimates with the first categories omitted, one could “mirror” the three APC variables so that the highest age, period, and cohort values become the lowest.

  4. To estimate the parameters in Eq. (5b) with superscript D, one could first estimate the following:

    $$ Y={\boldsymbol{\upbeta}}_0^D+{\boldsymbol{\upalpha}}_1^{*}{A}_1^D+{\boldsymbol{\upalpha}}_2^{*}{A}_2^D+{\boldsymbol{\upbeta}}_1^{*}{P}_1^D+{\boldsymbol{\upbeta}}_2^{*}{P}_2^D+{\boldsymbol{\upgamma}}_1^{*}{C}_1^D+{\boldsymbol{\upgamma}}_2^{*}{C}_2^D+{\boldsymbol{\upgamma}}_3^{*}{C}_3^D+e. $$

    Then, to obtain, for example, the value of γ D3 , calculate γ D3   =  γ *3  + 2γ L5 .

  5. For any two ages i and j, the predicted values (controlling for period and cohort) based on Eq. (5) are β L0  + α L i and β L0  + α L j . The difference between these predictions equals α L i  − α L j , which represents the deviation of age i from age j and hence is equal to α D i .

References

  • Fienberg, S. E. (2013). Cohort analysis’ unholy quest: A discussion. Demography, 50, 1981–1984.

    Article  Google Scholar 

  • Hardy, M. A. (1993). Regression with dummy variables. Newbury Park, CA: Sage.

    Google Scholar 

  • Held, L., & Riebler, A. (2013). Comment on “Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort (APC) problem.” Demography, 50, 1977–1979.

    Article  Google Scholar 

  • Luo, L. (2013). Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort (APC) problem. Demography, 50, 1945–1967.

    Article  Google Scholar 

  • O’Brien, R. M. (2013). Comment of Liying Luo’s article, “Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort (APC) problem.” Demography, 50, 1973–1975.

    Article  Google Scholar 

  • Yang, Y. C., Fu, W. J., & Land, K. C. (2004). A methodological comparison of age-period-cohort models: The intrinsic estimator and conventional generalized linear models. Sociological Methodology, 34, 75–110.

    Article  Google Scholar 

  • Yang, Y. C., & Land, K. C. (2013). Misunderstandings, mischaracterizations, and the problematic choice of a specific instance in which the IE should never be applied. Demography, 50, 1969–1971.

    Article  Google Scholar 

  • Yang, Y. C., Schulhofer-Wohl, S., Fu, W. J., & Land, K. C. (2008). The intrinsic estimator for age-period-cohort analysis: What it is and how to use it. American Journal of Sociology, 113, 1697–1736.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ben Pelzer.

Appendices

Appendix A: Proof of the Non-uniqueness of the IE in Case of Effect Coding

To arrive at Eq. (2), we used effect coding with the last categories of age, period, and cohort omitted from the equation. In this appendix, we will show that the IE yields different estimates when the first categories are omitted. Equation (2) then changes into

$$ \begin{array}{l}Y={\upbeta}_0^F+{\upalpha}_2^F{A}_2^F+{\upalpha}_3^F{A}_3^F+{\upbeta}_2^F{P}_2^F+{\upbeta}_3^F{P}_3^F\hfill \\ {}+{\upgamma}_2^F{C}_2^F+{\upgamma}_3^F{C}_3^F+{\upgamma}_4^F{C}_4^F+{\upgamma}_5^F{C}_5^F+e,\hfill \end{array} $$
(2a)

where superscript F denotes that the first categories of age, period, and cohort are omitted. The independent variables in (2a) differ from those in (2) because they now take value –1 for cases in the first category of age, period, or cohort. Again, the variable for the fourth cohort can be expressed in terms of other variables in (2a):

$$ {C}_4^F=-{A}_3^F+{P}_3^F+{C}_2^F-2{C}_5^F. $$
(3a)

Substituting into (2a) the expression for C F4 given in (3a) yields the following:

$$ \begin{array}{l}Y={\upbeta}_0^F+{\upalpha}_2^F{A}_2^F+\left({\upalpha}_3^F-{\upgamma}_4^F\right){A}_3^F+{\upbeta}_2^F{P}_2^F+\left({\upbeta}_3^F+{\upgamma}_4^F\right){P}_3^F\hfill \\ {}+\left({\upgamma}_2^F+{\upgamma}_4^F\right){C}_2^F+{\upgamma}_3^F{C}_3^F+\left({\upgamma}_5^F-2{\upgamma}_4^F\right){C}_4^F+e.\hfill \end{array} $$
(4a)

The preceding equation can be represented more compactly:

$$ Y={\boldsymbol{\upbeta}}_0^F+{\boldsymbol{\upalpha}}_2^F{A}_2^F+{\boldsymbol{\upalpha}}_3^F{A}_3^F+{\boldsymbol{\upbeta}}_2^F{P}_2^F+{\boldsymbol{\upbeta}}_3^F{P}_3^F+{\boldsymbol{\upgamma}}_2^F{C}_2^F+{\boldsymbol{\upgamma}}_3^F{C}_3^F+{\boldsymbol{\upgamma}}_5^F{C}_5^F+e. $$
(5a)

The parameters in (5a) are identified because C F4 is not part of the equation. Following the same line of reasoning as outlined in the main text for the last categories omitted, we now have to minimize the sum of squares \( {\left({\widehat{\upalpha}}_3^F\right)}^2+{\left({\widehat{\boldsymbol{\upbeta}}}_3^F\right)}^2+{\left({\widehat{\upgamma}}_2^F\right)}^2+{\left({\widehat{\upgamma}}_4^F\right)}^2+{\left({\widehat{\upgamma}}_5^F\right)}^2 \), which finally leads to the following IE estimator for γ F4 :

$$ {\widehat{\upgamma}}_4^F=\left(-{\boldsymbol{\upalpha}}_3^F+{\boldsymbol{\upbeta}}_3^F+{\boldsymbol{\upgamma}}_2^F-2{\boldsymbol{\upgamma}}_5^F\right)/8. $$
(8a)

Recall that with the last categories omitted, we found the estimator given in Eq. (8):

$$ {\widehat{\upgamma}}_4^L=\left({\boldsymbol{\upalpha}}_1^L-{\boldsymbol{\upbeta}}_1^L+2{\boldsymbol{\upgamma}}_1^L+{\boldsymbol{\upgamma}}_2^L\right)/8. $$

Apparently, the estimate \( {\widehat{\upgamma}}_4^F \) depends on the value of − α F3  + β F3  + γ F2  − 2γ F5 , whereas the estimate \( {\widehat{\upgamma}}_4^L \) depends on the value of α L1  − β L1  + 2γ L1  + γ L2 . In general, with actual data, − α F3  + β F3  + γ F2  − 2γ F5 will not be equal to α L1  − β L1  + 2γ L1  + γ L2 . To see this, note that the boldfaced parameters in (8) and (8a) represent deviations from the means β L0 and β F0 in Eqs. (5) and (5a), respectively. Both (5) and (5a) are estimable because of the constraint that the deviation for the fourth cohort is equal to 0; that is, γ L4 = 0 and γ F4 = 0. As a consequence of using the same constraint in (5) and (5a), all α estimates with the same subscript are equal in both equations; the same holds for all β estimates with the same subscript and for all γ estimates with the same subscript. For example, α L2 = α F2 . Also, α L3 (to be derived as − α L1  − α L2 ) is equal to α F3 , as estimated with Eq. (5a). The estimates of the means β L0 and β F0 are equal as well. Thus, proving that − α F3  + β F3  + γ F2  − 2γ F5  ≠ α L1  − β L1  + 2γ L1  + γ L2 boils down to proving that − α L3  + β L3  + γ L2  − 2γ L5  ≠ α L1  − β L1  + 2γ L1  + γ L2 . In this last inequality, the expression to the left of the inequality sign contains different deviations from the mean β L0 than the expression to the right. As a consequence, the values of \( {\widehat{\upgamma}}_4^F \) and \( {\widehat{\upgamma}}_4^L \) will usually differ for actual data. Also, the parameter estimates that depend on the values of \( {\widehat{\upgamma}}_4^F \) and \( {\widehat{\upgamma}}_4^L \) will generally be different: for example, \( {\widehat{\upalpha}}_1^L={\boldsymbol{\upalpha}}_1^L-{\widehat{\upgamma}}_4^L \), whereas \( {\widehat{\upalpha}}_1^F=-{\widehat{\upalpha}}_2^F-{\widehat{\upalpha}}_3^F=-{\boldsymbol{\upalpha}}_2^F-\left({\boldsymbol{\upalpha}}_3^F-{\widehat{\upgamma}}_4^F\right)={\boldsymbol{\upalpha}}_1^F+{\widehat{\upgamma}}_4^F={\boldsymbol{\upalpha}}_1^L+{\widehat{\upgamma}}_4^F \).Footnote 3

Appendix B

In this appendix, we show why with dummy variable coding, the IE estimates generally differ from the “default” IE estimates Yang et al. (2004) presented. Instead of standard 0 and 1 coded dummy variables, we subtract 1 / k from these dummy variables, with k denoting the number of categories—that is, 3 for age and period, and 5 for cohort. Subtracting the constant 1 / k from the original 0 and 1 coded dummy variables does not change the interpretation of their regression coefficients (i.e., the deviation from the omitted category). If we omit the last age category, the dummy variables A D1 and A D2 for the first two ages have the coding scheme shown in Table 2.

Table 2 Dummy variable coding for age after subtracting one-third

Note that in Table 2, the sum over the three ages for each of the dummy variables (column sum) is 0, just as with effect coding. As a result, the intercept in Eq. (2b) is equal to the intercept in Eq. (2), both representing the unweighted mean of the three predicted values for ages 1, 2, and 3 (controlling for period and cohort). With the last categories omitted, we obtain the following equation to be estimated:

$$ \begin{array}{l}Y={\upbeta}_0^D+{\upalpha}_1^D{A}_1^D+{\upalpha}_2^D{A}_2^D+{\upbeta}_1^D{P}_1^D+{\upbeta}_2^D{P}_2^D\hfill \\ {}+{\upgamma}_1^D{C}_1^D+{\upgamma}_2^D{C}_2^D+{\upgamma}_3^D{C}_3^D+{\upgamma}_4^D{C}_4^D+e.\hfill \end{array} $$
(2b)

To keep notation parsimonious, we do not use superscript DL to explicitly indicate that the last dummy variable–coded category is the reference. Like Eqs. (2) and (2a), the new Eq. (2b) also suffers from the identification problem. For example, we can write the following for C D4 :

$$ {C}_4^D=-2{A}_1^D-{A}_2^D+2{P}_1^D+{P}_2^D-4{C}_1^D-3{C}_2^D-2{C}_3^D. $$
(3b)

If we substitute in (2b) for C D4 the expression given in (3b), we obtain

$$ \begin{array}{l}Y={\upbeta}_0^D+\left({\upalpha}_1^D-2{\upgamma}_4^D\right){A}_1^D+\left({\upalpha}_2^D-{\upgamma}_4^D\right){A}_2^D+\left({\upbeta}_1^D+2{\upgamma}_4^D\right){P}_1^D\hfill \\ {}+\left({\upbeta}_2^D+{\upgamma}_4^D\right){P}_2^D+\left({\upgamma}_1^D-4{\upgamma}_4^D\right){C}_1^D+\left({\upgamma}_2^D-3{\upgamma}_4^D\right){C}_2^D\hfill \\ {}+\left({\upgamma}_3^D-2{\upgamma}_4^D\right){C}_3^D+e.\hfill \end{array} $$
(4b)

Recall that for effect coding with the last category omitted, we used the constraint that γ L4 = 0 in Eq. (2), leading to the estimable Eq. (5). This assumption implies that the deviation of cohort 4 from the mean is 0. Because the deviation of cohort 5 from the mean equals γ L5 , the assumption γ L4 = 0 implies that the deviation of cohort 4 from cohort 5 equals − γ L5 (the value of − γ L5 can be derived from the estimates of Eq. (5): − γ L5 = γ L1  + γ L2  + γ L3  + γ L4 = γ L1  + γ L2  + γ L3 ). In terms of the regression coefficients of Eq. (2b), the above implies: γ D4 = − γ L5 . If, in Eq. (2b), we plug in − γ L5 for γ D4 , substitute the expression for C D4 given in (3b), and rearrange terms we obtain the following estimable equation:Footnote 4

$$ \begin{array}{l}Y={\boldsymbol{\upbeta}}_0^D+\left({\boldsymbol{\upalpha}}_1^D+2{\boldsymbol{\upgamma}}_5^L\right){A}_1^D+\left({\boldsymbol{\upalpha}}_2^D+{\boldsymbol{\upgamma}}_5^L\right){A}_2^D+\left({\boldsymbol{\upbeta}}_1^D-2{\boldsymbol{\upgamma}}_5^L\right){P}_1^D\hfill \\ {}+\left({\boldsymbol{\upbeta}}_2^D-{\boldsymbol{\upgamma}}_5^L\right){P}_2^D+\left({\boldsymbol{\upgamma}}_1^D+4{\boldsymbol{\upgamma}}_5^L\right){C}_1^D+\left({\boldsymbol{\upgamma}}_2^D+3{\boldsymbol{\upgamma}}_5^L\right){C}_2^D\hfill \\ {}+\left({\boldsymbol{\upgamma}}_3^D+2{\boldsymbol{\upgamma}}_5^L\right){C}_3^D+e.\hfill \end{array} $$
(5b)

Because Eqs. (5) and (5b) are based on the same constraint with respect to cohort 4, the effect coding parameters in (5) can be translated into deviations from the reference categories resulting from estimating (5b). This is relevant when we compare the effect-coded IE with the dummy variable–coded IE later in this section. The parameters in (4b) and (5b) are related as follows:

$$ \begin{array}{l}{\upbeta}_0^D={\boldsymbol{\upbeta}}_0^D,\hfill \\ {}{\upalpha}_1^D={\boldsymbol{\upalpha}}_1^D+2{\boldsymbol{\upgamma}}_5^L+2{\upgamma}_4^D,\kern0.5em {\upalpha}_2^D={\boldsymbol{\upalpha}}_2^D+{\boldsymbol{\upgamma}}_5^L+{\upgamma}_4^D,\hfill \\ {}{\upbeta}_1^D={\boldsymbol{\upbeta}}_1^D-2{\boldsymbol{\upgamma}}_5^L-2{\upgamma}_4^D,\kern0.5em {\upbeta}_2^D={\boldsymbol{\upbeta}}_2^D-{\boldsymbol{\upgamma}}_5^L-{\upgamma}_4^D,\hfill \\ {}{\upgamma}_1^D={\boldsymbol{\upgamma}}_1^D+4{\boldsymbol{\upgamma}}_5^L+4{\upgamma}_4^D,\kern0.5em {\upgamma}_2^D={\boldsymbol{\upgamma}}_2^D+3{\boldsymbol{\upgamma}}_5^L+3{\upgamma}_4^D,\kern0.5em {\upgamma}_3^D={\boldsymbol{\upgamma}}_3^D+2{\boldsymbol{\upgamma}}_5^L+2{\upgamma}_4^D.\hfill \end{array} $$

Having an estimate for the parameter of the fourth cohort leads to estimates for the remaining parameters (except for β D0 ). The IE now employs the criterion of minimizing the following sum of squares:

$$ {\left({\widehat{\upalpha}}_1^D\right)}^2+{\left({\widehat{\upalpha}}_1^D\right)}^2+{\left({\widehat{\upbeta}}_1^D\right)}^2+{\left({\widehat{\upbeta}}_2^D\right)}^2+{\left({\widehat{\upgamma}}_1^D\right)}^2+{\left({\widehat{\upgamma}}_2^D\right)}^2+{\left({\widehat{\upgamma}}_3^D\right)}^2+{\left({\widehat{\upgamma}}_4^D\right)}^2. $$

This is a completely different criterion than the one used in the effect-coded IE, proposed by Yang et al. (2004). Not only are more parameters (eight instead of five) involved in the sum of squares to be minimized, but the parameters also have a different meaning (i.e., distances to the reference category instead of distances to the mean). For the preceding sum of squares, we can write

$$ \begin{array}{l}{\left({\boldsymbol{\upalpha}}_1^D+2{\boldsymbol{\upgamma}}_5^L+2{\widehat{\upgamma}}_4^D\right)}^2+{\left({\boldsymbol{\upalpha}}_2^D+{\boldsymbol{\upgamma}}_5^L+{\widehat{\upgamma}}_4^D\right)}^2+{\left({\boldsymbol{\upbeta}}_1^D-2{\boldsymbol{\upgamma}}_5^L-2{\widehat{\upgamma}}_4^D\right)}^2+{\left({\boldsymbol{\upbeta}}_2^D-{\boldsymbol{\upgamma}}_5^L-{\widehat{\upgamma}}_4^D\right)}^2\hfill \\ {}+{\left({\boldsymbol{\upgamma}}_1^D+4{\boldsymbol{\upgamma}}_5^L+4{\widehat{\upgamma}}_4^D\right)}^2+{\left({\boldsymbol{\upgamma}}_2^D+3{\boldsymbol{\upgamma}}_5^L+3{\widehat{\upgamma}}_4^D\right)}^2+{\left({\boldsymbol{\upgamma}}_3^D+2{\boldsymbol{\upgamma}}_5^L+2{\widehat{\upgamma}}_4^D\right)}^2+{\left({\widehat{\upgamma}}_4^D\right)}^2.\hfill \end{array} $$

Taking the first-order derivative of this sum of squares with respect to \( {\widehat{\upgamma}}_4^D \) and setting it to 0 finally leads to the following IE estimate for γ D4 :

$$ {\widehat{\upgamma}}_4^D=\left(-4{\boldsymbol{\upalpha}}_1^D-2{\boldsymbol{\upalpha}}_2^D+4{\boldsymbol{\upbeta}}_1^D+2{\boldsymbol{\upbeta}}_2^D-8{\boldsymbol{\upgamma}}_1^D-6{\boldsymbol{\upgamma}}_2^D-4{\boldsymbol{\upgamma}}_3^D-78{\boldsymbol{\upgamma}}_5^L\right)/80. $$
(8b)

Because the boldfaced parameters in this expression for \( {\widehat{\upgamma}}_4^D \) are compatible with the ones in Eq. (5), we can formulate \( {\widehat{\upgamma}}_4^D \)in terms of the effect coding parameters of Eq. (5). For example, coefficient α D1 , which is the deviation of age 1 from reference category age 3, can be written as the difference of the two corresponding effect coding parameters: namely, α L1  − α L3 .Footnote 5 Building on this relation between the coefficients of effect and dummy coding, we can write the above estimate of \( {\widehat{\upgamma}}_4^D \) in terms of the effect coding parameters of Eq. (5):

$$ \begin{array}{l}{\widehat{\upgamma}}_4^D=\left(-4\left({\boldsymbol{\upalpha}}_1^L-{\boldsymbol{\upalpha}}_3^L\right)-2\left({\boldsymbol{\upalpha}}_2^L-{\boldsymbol{\upalpha}}_3^L\right)+4\left({\boldsymbol{\upbeta}}_1^L-{\boldsymbol{\upbeta}}_3^L\right)+2\left({\boldsymbol{\upbeta}}_2^L-{\boldsymbol{\upbeta}}_3^L\right)\right.\\ {}\left.-8\left({\boldsymbol{\upgamma}}_1^L-{\boldsymbol{\upgamma}}_5^L\right)-6\left({\boldsymbol{\upgamma}}_2^L-{\boldsymbol{\upgamma}}_5^L\right)-4\left({\boldsymbol{\upgamma}}_3^L-{\boldsymbol{\upgamma}}_5^L\right)-78{\boldsymbol{\upgamma}}_5^L\right)/80.\end{array} $$

Using our knowledge that α L3  = − α L1 α L2 , β L3  = − β L1 β L2 , and γ L5  = − γ L1 γ L2 γ L3 , we finally get the following:

$$ {\widehat{\upgamma}}_4^D=\left(-{\boldsymbol{\upalpha}}_1^L-0.8{\boldsymbol{\upalpha}}_2^L+{\boldsymbol{\upbeta}}_1^L+0.8{\boldsymbol{\upbeta}}_2^L+5.2{\boldsymbol{\upgamma}}_1^L+5.4{\boldsymbol{\upgamma}}_2^L+5.6{\boldsymbol{\upgamma}}_3^L\right)/8. $$
(8b*)

If we compare this IE estimator of \( {\widehat{\upgamma}}_4^D \)with \( {\widehat{\upgamma}}_4^L=\left({\boldsymbol{\upalpha}}_1^L-{\boldsymbol{\upbeta}}_1^L+2{\boldsymbol{\upgamma}}_1^L+{\boldsymbol{\upgamma}}_2^L\right)/8 \)of Eq. (8), it is obvious that the IE estimate for the fourth cohort effect will usually differ between the effect-coded IE as proposed by Yang et al. (2004) and the dummy variable–coded IE (with the last categories as the omitted ones for both codings). The same holds for the IE estimates of the remaining categories of the three APC variables. Similar to the IE with effect coding, each triplet of reference categories of age, period, and cohort leads to different estimates when dummy variable coding is used, which we do not elaborate here.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pelzer, B., te Grotenhuis, M., Eisinga, R. et al. The Non-uniqueness Property of the Intrinsic Estimator in APC Models. Demography 52, 315–327 (2015). https://doi.org/10.1007/s13524-014-0360-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13524-014-0360-3

Keywords

Navigation