Latent Health Stock
The HRS has a variety of health measures. These include a subjective general measure of individual’s self-reported health and relatively more objective measures of health based such as functional limitations (ADL difficulty), medical diagnosis of chronic illnesses, body mass index and health care utilization which are reported in Table 1. Although self-reported health has been widely used in several studies based on survey data, it may be plagued with problems that lead to bias. As discussed earlier in the paper the problems pertaining to self-reported health are first, self-reported measures of health are based on subjective judgments and there is no reason to believe that these judgments are comparable across individuals. Second, since poor health may represent a legitimate reason for a person of working age to be outside the labor force, respondents who are not working may cite health problems as a way to rationalize behavior (the “justification hypothesis”). The alternative to using self-reported health could be substituting it by relatively more objective measures of health.26 But these measures may also be self-reported or assessed by the interviewer such that they are not superior indicators of an individual’s health (Bound 1991). In order to mitigate the problems associated with self-reported measure of individual health, I have defined a latent health stock variable. Following Bound (1991) and implemented in Bound et al. (1998), a model of self-reported health as a function of relatively more objective measures of health (reported in Table 1) is estimated to create a latent health stock.27 Then the predicted value for the latent health stock is used as a regressor in hazard analysis.
I adopted the approach of Rice et al. (2010) and used an ordered probit model to estimate self-reported health, where the ordered measure of self-reported health (1 = poor, 2 = fair, 3 = good or very good and 4 = excellent) is regressed on 16 relatively more objective physical and mental health explanatory variables and healthcare utilization. The predicted value of the outcome from this estimation is the latent health stock variable which is used as a regressor in proportional hazard model in the main body of the paper. Accordingly, a lower level of health status is given by a smaller value of the latent health stock while a higher level of health status is given by a larger value of the latent health stock. Table 15 presents the marginal effects of the objective health measures for the four different responses (cut points) for self-reported health in an ordered probit model. All objective measures have a statistically significant impact on an individual’s self-report of health but each measure weighs differently across the four response categories. In Table 15, column (1) positive marginal effects imply incidence of functional limitations, chronic conditions, depression, higher BMI, more nights spent at hospital, more doctor office visits and higher out of pocket medical expenditure will increase the probability with which an individual is predicted to be in the lowest health category (poor). Similarly, in column (4) the negative marginal effects signify incidence of functional limitations, chronic conditions, depression, higher BMI, more nights spent at hospital, more doctor office visits and higher out of pocket medical expenditure will increase the probability with which an individual is predicted to be in the highest health category (excellent). The same holds true for the marginal effects in the other columns. An increase in latent health stock implies a change from prediction of poor health to better health.
Principal Component Analysis (PCA).
Principal components analysis is a method for detecting a small number of uncorrelated variables, called “principal components”, from a large dataset. The objective of principal components analysis is to explain the maximum amount of variance with the minimum number of principal components. PCA analyzes a dataset representing observations described by several variables, which are, in general, inter-correlated. Its goal is to extract the important information from the data and to express this information as a set of new orthogonal variables called principal components. The primary goal of principal component analysis is data reduction and addressing multicollinearity. It is a non-parametric technique which has an underlying weakness- data reduction due to PCA leads to loss of information. The association between the components and the original variables is called the component’s eigenvalue. In multivariate analysis, the correlation between the component and the original variables is called the component loadings (factor loadings) which are analogous to correlation coefficients, squaring them give the amount of explained variation. Therefore the component loadings tell us how much of the variation in a variable is explained by the component.
In this paper, the main purpose of using principal component analysis is to lend more objectivity to health measures. According to the theory of health production function, an individual’s health is a durable good which depends on several factors, some of which may be influenced by an individual. Hence health status of an individual does not solely depend on incidence of physical and mental diseases but on factors like utilization of healthcare inputs, lifestyle behavioral practices, job characteristics, genetic elements etc. Accordingly, I use twenty-eight interrelated variables that are likely to influence the health status of an individual. These variables are reported in Appendix Table 11. In addition to the standard physical and mental health measures (which includes ADL difficulties, other mobility difficulties, chronic illnesses, depression and cognitive problems), I have included information on memory related diseases (Dementia and Alzheimer), healthcare utilization (hospital stay, nursing home stay, doctor office visits and out of pocket medical expenditure), lifestyle factors (smoking, drinking behavior, exercising), job related characteristics (stress, physical effort at work) and genetic information (proxied by average age of parents). From variables, PCA yields 28 factors or principal components. Out of these 28 extracted components only eight with Eigen value greater than 1 are retained (reported in Appendix Table 10). This is known as the “Kaiser-Gutman” Rule. The sum of all Eigen values is equal to number of included variables. In Table 10, ‘Difference’ column shows the difference in two consecutive Eigen values. ‘Proportion’ represents the relative weight of each factor in the total variance. For example Factor 1 (Chronic Condition Factor) explains 13% of the total variance. ‘Cumulative Proportion Explained’ shows the amount of variance explained by n + (n-1) factors. For example Factor 1 (Chronic Conditions Factor) and Factor 2 (Functional Limitations) explain 22% of total variance. Similarly the eight chosen factors together explain 54% of the total variance. Table 11 shows the pattern matrix which gives a clearer picture of the relevance of each variable in a factor. Factor loadings are the weights and correlations between each variable and the factor. The higher the load the more relevant is the variable in defining the factor’s dimensionality. A positive sign indicates a positive relation between the variable and the factor and a negative value indicates an inverse impact on the factor. Uniqueness is the variance that is ‘unique’ to the variable and not shared with other variables. Each factor is named keeping in mind the variables that load heavily in them. As illustrated in Appendix Table 11, number of chronic conditions load most heavily in Factor 1. Number of ADL difficulties and Mobility Difficulties define Factor 2, but the former has higher loading or correlation with the factor. Hence ADL difficulties are more important that mobility difficulties. Similarly, nights spent at hospital and nursing home define Factor 3, incidence of memory related diseases (Dementia and Alzheimer) and total cognition score defines Factor 4. The heaviest factor loadings for each factor are shaded in grey in Appendix Table 11.
These factors are orthogonal to each other which means they are not correlated to each other. Based on factor loadings I have labeled the factors as Factor 1: Has chronic conditions, Factor 2: Has functional limitation, Factor 3: Hospital stay, Factor 4: Has cognitive functioning problems, Factor 5: Has depression, Factor 6: Lack of physical exercise, Factor 7: Has cancer, and Factor 8: Has lifestyle behavioral problems. Principal components are used because several variables together rather than alone define an interpretable concept. The predicted value of the factors are then used in hazard analysis. Without using PCA it would not be possible to disentangle the causal effect of the health measures since they are highly interrelated. Although uncorrelated factors created through PCA are valuable for empirical model in the paper, there are limitations like loss of information due to aggregation and difficulty in interpretation of regressions coefficients.