Abstract
When a regression model is estimated using a censored subset of observations, coefficient estimates may be biased. Censored regression models have a long history in biometrics, engineering and other areas of applied statistics. The interest of economists in these models was stimulated by Tobin’s work on durable goods consumption in the late 1950s. It was Heckman’s publication of a simple two-step procedure for estimating censored regression models, however, that led to their widespread usage in applied econometric studies. Although this is not necessarily the best method for estimating all censored regression models, it has certain attractive properties. An understanding of this method is vital to the proper interpretation of the wealth of applied studies based on this approach. Also, valuable insights into the basic nature of sample selection problems can be gained from the formulation of the censored regression model popularized by Heckman.
We begin by exploring the estimation problems resulting from censoring and from certain properties of Heckman’s two-step estimation method. Procedures are developed for assessing the nature and extent of problems resulting from censoring; these procedures are then applied in an empirical analysis of the wage rates and hours of work of individuals in 10 different demographic groups using data from the Panel Study of Income Dynamics. One of our findings is that estimation using censored data can lead to bias and other related problems even when the degree of censoring is slight.
The authors express their appreciation to John Ham, Baldev Raj, Thanasis Stengos and two anonymous referees for their helpful comments on earlier versions of this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Notes
See, for instance, Buckley and James (1979), Dempster, Laird and Rubin (1977), Hartley (1958), Johnson and Kotz (1970, 1972), and Kalbfleisch and Prentice (1980) for an introduction to some of this history, including further references.
The empirical applications we are referring to are for models which can be classified as generalized Tobit models (Type 2 Tobit models in Amemiya, 1984, 1985), where the explanatory variables entering the regression model (to be estimated with a censored subset of observations) may differ from the explanatory variables entering the selection rule. Many problems in demography, education, marketing, finance, labor economics and other fields can be expressed using this sort of model. See Amemiya (1985, p. 365, Table 10.1) for a listing of applications covering a wide range of topics. Under various circumstances, estimation methods such as the EM algorithm, Powell’s least absolute deviations estimator (LAD), and various maximum likelihood procedures may yield better parameter estimates in terms of efficiency or mean squared error. See Amemiya (1973), Arabmazar and Schmidt (1981, 1982), Bera, Jarque and Lee (1984), Dempster, Laird and Rubin (1977), Dudley and Montmarquette (1976), Goldberger (1981, 1983), Huang (1964), Hurd (1979), Lee (1982), Nelson (1984), Olsen (1978, 1980), Paarsch (1984), Powell (1981, 1983a, 1983b, 1984), Robinson (1982), Wales and Woodland (1980) and Wu (1965). See Mroz (1987), for instance, concerning some of the substantive advantages of Heckman’s two-step estimation method.
This result is also given in Nelson (1984, p. 185, eq. 12). If predetermined variables are included in Z, then the asymptotic OLS bias vector may be defined as (σ1 ρ13 plimγ̂) provided that plimγ̂ exists.
For instance, in his discussion of Tobit-type models Kmenta (1986, p. 561) writes: “ The restriction on the observable range of the dependent variable matters if the probability of falling below the cut-off point is not negligible. In terms of our examples, if only a very small proportion of the households in the population did not purchase a durable good, or if only a very small proportion of all women were not in the labor force, the limited nature of the dependent variable could be ignored. Thus there is no problem in dealing with household expenditure on food or clothing, or with recorded wages of adult males.”.
This is because theoretically u 3i = v i-u 1i , and hence u 3i and u 1i will be correlated even if there is no correlation between the structural disturbance terms u 1i and v i.
Actually λ is a nonlinear function of Z, so \( R_{\lambda \cdot Z}^2 \) cannot equal 1 exactly. In actual applications of the Heckit procedure, we have often found that the R 2 for the OLS regression of λ on the other explanatory variables in the equation of interest is extremely close to 1.
See Kmenta (1986, pp. 437-438) for the usual OLS variance formula rewritten in a form so that the impacts of multicollinearity on the magnitudes of the variances of the coefficient estimates are explicit.
Multicollinearity problems can be considered in a more comprehensive way in the context of a principal components formulation of the model. See, for instance, Judge et al. (1985, Ch. 22) and Judge et al. (1988, Ch. 21). The approach adopted in this paper, however, may allow applied readers more readily to understand our main points.
See, for example, Heckman (1979, 1980) and Amemiya (1984, 1985) for the form of the variance-covariance matrix. Heckit estimation packages such as Greene (undated, p. 49) typically include some arbitrary procedure for avoiding negative estimates of the coefficient standard errors. Alternative estimates of the coefficient standard errors may be obtained using the procedure described in White (1980), as suggested by Lee (1982) and Amemiya (1984, p. 13). The latter estimates are used in this paper.
For a full description of the data bases and of the probit estimation results used in calculating the values of \( {\hat \lambda _i} \), see Nakamura and Nakamura (1985b, Section 2.7 and Chapter 3).
References
Amemiya, T.: 1973, ‘Regression Analysis When the Dependent Variable is Truncated Normal’, Econometrica, 42, 999–1012.
Amemiya, T.: 1984, ‘Tobit Models: A Survey’, Journal of Econometrics, 24, 3–61.
Amemiya, T.: 1985, Advanced Econometrics, Harvard University Press, Cambridge, Massachusetts, U.S.A.
Anderson, K. H., and M. A. Hill: 1983, ‘Marriage and Labor Market Discrimination in Japan’, Southern Economic Journal, 49(4), 941–953.
Arabmazar, A., and P. Schmidt: 1981, ‘Further Evidence on the Robustness of the Tobit Estimator to Heteroskedasticity’, Journal of Econometrics, 17, 253–258.
Arabmazar, A., and P. Schmidt: 1982, ‘An Investigation of the Robustness of the Tobit Estimator to Non-Normality’, Econometrica, 50, 1055–1063.
Bera, A. K., C. M. Jarque, and L. F. Lee: 1984, ‘Testing the Normality Assumption in Limited Dependent Variable Models’, International Economic Review, 25, 563–578.
Buckley, J., and I. James: 1979, ‘Linear Regression with Censored Data’, Biometrika, 66, 429–436.
Dempster, A. P., N. M. Laird, and D. B. Rubin: 1977, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm’, Journal of the Royal Statistical Society, B 39, 1–38.
Dudley, L., and C. Montmarquette: 1976, ‘A Model of the Supply of Bilateral Foreign Aid’, American Economic Review, 66, 132–142.
Franz, W.: 1985, ‘An Economic Analysis of Female Work Participation, Education, and Fertility: Theory and Empirical Evidence for the Federal Republic of Germany’, Journal of Labor Economics, 3, S218–S234.
Goldberger, A. S.: 1964, Econometric Theory, John Wiley and Sons, New York, New York, U.S.A.
Goldberger, A. S.: 1981, ‘Linear Regression After Selection’, Journal of Econometrics, 15, 357–366.
Goldberger, A. S.: 1983, ‘Abnormal Selection Bias’, in S. Karlin, T. Amemiya and L. A. Goodman (eds.), Studies in Econometrics, Time Series, and Multivariate Statistics, Academic Press, New York, New York, U.S.A., pp. 67–84.
Greene, W.: UMDEP Manual, undated.
Gronau, R.: 1974, ‘The Effects of Children on the Housewife’s Value of Time’, Journal of Political Economy, 82, 1119–1143.
Hartley, H. O.: 1958, ‘Maximum Likelihood Estimation from Incomplete Data’, Biometrica, 14, 174–194.
Heckman, J. J.: 1974, ’shadow Prices, Market Wages and Labor Supply’, Econometrica, 42, 679–694.
Heckman, J. J.: 1976, ‘The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models’, Annals of Economic and Social Measurement, 5, 475–492.
Heckman, J. J.: 1979, ‘The Sample Selection Bias as a Specification Error’, Econometrica, 47, 153–162.
Heckman, J. J.: 1980, ‘The Selection Sample Bias as a Specification Error with an Application to the Estimation of Labor Supply Functions’, in J. P. Smith.
Huang, D. S.: 1964, ‘Discrete Stock Adjustment: The Case of Demand for Automobiles’, International Economic Review, 5, 46–62.
Hurd, M.: 1979, ‘Estimation in Truncated Samples When There is Heteroskedasticity’, Journal of Econometrics, 11, 247–258.
Johnson, N. L., and S. Kotz: 1970 and 1972, Distributions in Statistics, Vols. 1 and 4, Wiley, New York, New York, U.S.A.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.-C. Lee: 1985, The Theory and Practice of Econometrics 2nd edition, John Wiley and Sons, New York, New York, U.S.A.
Judge, G. G., R. C. Hill, W. E. Griffiths, H. Lütkepohl, and T.-C. Lee: 1988, Introduction to the Theory and Practice of Econometrics 2nd edition, John Wiley and Sons, New York, New York, U.S.A.
Kalbfleisch, J. O., and R. L. Prentice: 1980, Statistical Analysis of Failure Time Data, Wiley, New York, New York, U.S.A.
Kmenta, J.: 1986, Elements of Econometrics, 2nd edition, Macmillan Company.
Lee, L. F.: 1982, ’some Approaches to the Correction of Selectivity Bias’, Review of Economic Studies, 49, 355–372.
Maddala, G. S.: 1983, Limited Dependent and Qualitative Variables in Econometrics, Cambridge University Press, London, England.
Mroz, T. A.: 1987, ‘The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions’, Econometrica, 55(4), 765–799.
Nakamura, A., and M. Nakamura: 1985a, ‘Dynamic Models of the Labor Force Behavior of Married Women Which Can Be Estimated Using Limited Amounts of Past Information’, Journal of Econometrics, 27, 273–298.
Nakamura, A., and M. Nakamura: 1985b, The Second Paycheck: A Socioeconomic Analysis of Earnings, Academic Press, Orlando, Florida, U.S.A.
Nelson, F.: 1984, ‘Efficiency of the Two-Step Estimator for Models with Endogenous Sample Selection’, Journal of Econometrics, 24, 181–196.
Ofer, G., and A. Vinokur: 1985, ‘Work and Family Roles of Soviet Women: Historical Trends and Cross-Section Analysis’, Journal of Labor Economics, 3, S328–S354.
Olsen, R. J.: 1978, ‘Note on the Uniqueness of the Maximum Likelihood Estimator for the Tobit Model’, Econometrica, 46, 1211–1215.
Olsen, R. J.: 1980, ‘A Least Squares Correction for Selectivity Bias’, Econometrica, 48, 1815–1820.
Paarsch, H. J.: 1984, ‘A Monte Carlo Comparison of Estimators for Censored Regression Models’, Journal of Econometrics, 24, 197–214.
Powell, J. L.: 1981, ‘Least Absolute Deviations Estimation for Censored and Truncated Regression Models’, Technical Report No. 356, Institute for Mathematical Studies in the Social Sciences, Stanford University, California, U.S.A.
Powell, J. L.: 1983a, ‘Asymptotic Normality of the Censored and Truncated Least Absolute Deviations Estimators’, Technical Report No. 395, Institute for Mathematical Studies in the Social Sciences, Stanford University, California, U.S.A.
Powell, J. L.: 1983b, ‘The Asymptotic Normality of Two-Stage Least Absolute Deviations Estimators’, Econometrica, 51, 1569–1575.
Powell, J. L.: 1984, ‘Least Absolute Deviations Estimation for the Censored Regression Model’, Journal of Econometrics, 25, 303–325.
Reimers, C. W.: 1983, ‘Labor Market Discrimination against Hispanic and Black Men’, Review of Economics and Statistics, 65, 570–579.
Robinson, P. M.: 1982, ‘On the Asymptotic Properties of Estimators of Models Containing Limited Dependent Variables’, Econometrica, 50, 27–41.
Smith, J. P.: 1980, Female Labor Supply: Theory and Estimation, Princeton University Press, Princeton, New Jersey, U.S.A.
Tobin, J.: 1958, ‘Estimation of Relationships for Limited Dependent Variables’, Econometrica, 26, 24–36.
Wales, T. J., and A. D. Woodland: 1980, ’sample Selectivity and the Estimation of Labor Supply Functions’, International Economic Review, 21, 437–468.
White, H.: 1980, ‘A Heteroskedasticity-consistent Covariance Estimator and a Direct Test for Heteroskedasticity’, Econometrica, 48, 817–838.
Wu, D. M.: 1965, ‘An Empirical Analysis of Household Durable Goods Expenditure’, Econometrica, 33, 761–780.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1989 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Nakamura, A., Nakamura, M. (1989). Selection Bias: More than a Female Phenomenon. In: Raj, B. (eds) Advances in Econometrics and Modelling. Advanced Studies in Theoretical and Applied Econometrics, vol 15. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-7819-6_10
Download citation
DOI: https://doi.org/10.1007/978-94-015-7819-6_10
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-4048-0
Online ISBN: 978-94-015-7819-6
eBook Packages: Springer Book Archive