Skip to main content

Selection Bias: More than a Female Phenomenon

  • Chapter
Advances in Econometrics and Modelling

Part of the book series: Advanced Studies in Theoretical and Applied Econometrics ((ASTA,volume 15))

Abstract

When a regression model is estimated using a censored subset of observations, coefficient estimates may be biased. Censored regression models have a long history in biometrics, engineering and other areas of applied statistics. The interest of economists in these models was stimulated by Tobin’s work on durable goods consumption in the late 1950s. It was Heckman’s publication of a simple two-step procedure for estimating censored regression models, however, that led to their widespread usage in applied econometric studies. Although this is not necessarily the best method for estimating all censored regression models, it has certain attractive properties. An understanding of this method is vital to the proper interpretation of the wealth of applied studies based on this approach. Also, valuable insights into the basic nature of sample selection problems can be gained from the formulation of the censored regression model popularized by Heckman.

We begin by exploring the estimation problems resulting from censoring and from certain properties of Heckman’s two-step estimation method. Procedures are developed for assessing the nature and extent of problems resulting from censoring; these procedures are then applied in an empirical analysis of the wage rates and hours of work of individuals in 10 different demographic groups using data from the Panel Study of Income Dynamics. One of our findings is that estimation using censored data can lead to bias and other related problems even when the degree of censoring is slight.

The authors express their appreciation to John Ham, Baldev Raj, Thanasis Stengos and two anonymous referees for their helpful comments on earlier versions of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

  1. See, for instance, Buckley and James (1979), Dempster, Laird and Rubin (1977), Hartley (1958), Johnson and Kotz (1970, 1972), and Kalbfleisch and Prentice (1980) for an introduction to some of this history, including further references.

    Google Scholar 

  2. The empirical applications we are referring to are for models which can be classified as generalized Tobit models (Type 2 Tobit models in Amemiya, 1984, 1985), where the explanatory variables entering the regression model (to be estimated with a censored subset of observations) may differ from the explanatory variables entering the selection rule. Many problems in demography, education, marketing, finance, labor economics and other fields can be expressed using this sort of model. See Amemiya (1985, p. 365, Table 10.1) for a listing of applications covering a wide range of topics. Under various circumstances, estimation methods such as the EM algorithm, Powell’s least absolute deviations estimator (LAD), and various maximum likelihood procedures may yield better parameter estimates in terms of efficiency or mean squared error. See Amemiya (1973), Arabmazar and Schmidt (1981, 1982), Bera, Jarque and Lee (1984), Dempster, Laird and Rubin (1977), Dudley and Montmarquette (1976), Goldberger (1981, 1983), Huang (1964), Hurd (1979), Lee (1982), Nelson (1984), Olsen (1978, 1980), Paarsch (1984), Powell (1981, 1983a, 1983b, 1984), Robinson (1982), Wales and Woodland (1980) and Wu (1965). See Mroz (1987), for instance, concerning some of the substantive advantages of Heckman’s two-step estimation method.

    Google Scholar 

  3. This result is also given in Nelson (1984, p. 185, eq. 12). If predetermined variables are included in Z, then the asymptotic OLS bias vector may be defined as (σ1 ρ13 plimγ̂) provided that plimγ̂ exists.

    Google Scholar 

  4. For instance, in his discussion of Tobit-type models Kmenta (1986, p. 561) writes: “ The restriction on the observable range of the dependent variable matters if the probability of falling below the cut-off point is not negligible. In terms of our examples, if only a very small proportion of the households in the population did not purchase a durable good, or if only a very small proportion of all women were not in the labor force, the limited nature of the dependent variable could be ignored. Thus there is no problem in dealing with household expenditure on food or clothing, or with recorded wages of adult males.”.

    Google Scholar 

  5. This is because theoretically u 3i = v i-u 1i , and hence u 3i and u 1i will be correlated even if there is no correlation between the structural disturbance terms u 1i and v i.

    Google Scholar 

  6. Actually λ is a nonlinear function of Z, so \( R_{\lambda \cdot Z}^2 \) cannot equal 1 exactly. In actual applications of the Heckit procedure, we have often found that the R 2 for the OLS regression of λ on the other explanatory variables in the equation of interest is extremely close to 1.

    Google Scholar 

  7. See Kmenta (1986, pp. 437-438) for the usual OLS variance formula rewritten in a form so that the impacts of multicollinearity on the magnitudes of the variances of the coefficient estimates are explicit.

    Google Scholar 

  8. Multicollinearity problems can be considered in a more comprehensive way in the context of a principal components formulation of the model. See, for instance, Judge et al. (1985, Ch. 22) and Judge et al. (1988, Ch. 21). The approach adopted in this paper, however, may allow applied readers more readily to understand our main points.

    Google Scholar 

  9. See, for example, Heckman (1979, 1980) and Amemiya (1984, 1985) for the form of the variance-covariance matrix. Heckit estimation packages such as Greene (undated, p. 49) typically include some arbitrary procedure for avoiding negative estimates of the coefficient standard errors. Alternative estimates of the coefficient standard errors may be obtained using the procedure described in White (1980), as suggested by Lee (1982) and Amemiya (1984, p. 13). The latter estimates are used in this paper.

    Google Scholar 

  10. For a full description of the data bases and of the probit estimation results used in calculating the values of \( {\hat \lambda _i} \), see Nakamura and Nakamura (1985b, Section 2.7 and Chapter 3).

    Google Scholar 

References

  • Amemiya, T.: 1973, ‘Regression Analysis When the Dependent Variable is Truncated Normal’, Econometrica, 42, 999–1012.

    Article  Google Scholar 

  • Amemiya, T.: 1984, ‘Tobit Models: A Survey’, Journal of Econometrics, 24, 3–61.

    Article  Google Scholar 

  • Amemiya, T.: 1985, Advanced Econometrics, Harvard University Press, Cambridge, Massachusetts, U.S.A.

    Google Scholar 

  • Anderson, K. H., and M. A. Hill: 1983, ‘Marriage and Labor Market Discrimination in Japan’, Southern Economic Journal, 49(4), 941–953.

    Article  Google Scholar 

  • Arabmazar, A., and P. Schmidt: 1981, ‘Further Evidence on the Robustness of the Tobit Estimator to Heteroskedasticity’, Journal of Econometrics, 17, 253–258.

    Article  Google Scholar 

  • Arabmazar, A., and P. Schmidt: 1982, ‘An Investigation of the Robustness of the Tobit Estimator to Non-Normality’, Econometrica, 50, 1055–1063.

    Article  Google Scholar 

  • Bera, A. K., C. M. Jarque, and L. F. Lee: 1984, ‘Testing the Normality Assumption in Limited Dependent Variable Models’, International Economic Review, 25, 563–578.

    Article  Google Scholar 

  • Buckley, J., and I. James: 1979, ‘Linear Regression with Censored Data’, Biometrika, 66, 429–436.

    Article  Google Scholar 

  • Dempster, A. P., N. M. Laird, and D. B. Rubin: 1977, ‘Maximum Likelihood from Incomplete Data via the EM Algorithm’, Journal of the Royal Statistical Society, B 39, 1–38.

    Google Scholar 

  • Dudley, L., and C. Montmarquette: 1976, ‘A Model of the Supply of Bilateral Foreign Aid’, American Economic Review, 66, 132–142.

    Google Scholar 

  • Franz, W.: 1985, ‘An Economic Analysis of Female Work Participation, Education, and Fertility: Theory and Empirical Evidence for the Federal Republic of Germany’, Journal of Labor Economics, 3, S218–S234.

    Article  Google Scholar 

  • Goldberger, A. S.: 1964, Econometric Theory, John Wiley and Sons, New York, New York, U.S.A.

    Google Scholar 

  • Goldberger, A. S.: 1981, ‘Linear Regression After Selection’, Journal of Econometrics, 15, 357–366.

    Article  Google Scholar 

  • Goldberger, A. S.: 1983, ‘Abnormal Selection Bias’, in S. Karlin, T. Amemiya and L. A. Goodman (eds.), Studies in Econometrics, Time Series, and Multivariate Statistics, Academic Press, New York, New York, U.S.A., pp. 67–84.

    Google Scholar 

  • Greene, W.: UMDEP Manual, undated.

    Google Scholar 

  • Gronau, R.: 1974, ‘The Effects of Children on the Housewife’s Value of Time’, Journal of Political Economy, 82, 1119–1143.

    Article  Google Scholar 

  • Hartley, H. O.: 1958, ‘Maximum Likelihood Estimation from Incomplete Data’, Biometrica, 14, 174–194.

    Article  Google Scholar 

  • Heckman, J. J.: 1974, ’shadow Prices, Market Wages and Labor Supply’, Econometrica, 42, 679–694.

    Article  Google Scholar 

  • Heckman, J. J.: 1976, ‘The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models’, Annals of Economic and Social Measurement, 5, 475–492.

    Google Scholar 

  • Heckman, J. J.: 1979, ‘The Sample Selection Bias as a Specification Error’, Econometrica, 47, 153–162.

    Article  Google Scholar 

  • Heckman, J. J.: 1980, ‘The Selection Sample Bias as a Specification Error with an Application to the Estimation of Labor Supply Functions’, in J. P. Smith.

    Google Scholar 

  • Huang, D. S.: 1964, ‘Discrete Stock Adjustment: The Case of Demand for Automobiles’, International Economic Review, 5, 46–62.

    Article  Google Scholar 

  • Hurd, M.: 1979, ‘Estimation in Truncated Samples When There is Heteroskedasticity’, Journal of Econometrics, 11, 247–258.

    Article  Google Scholar 

  • Johnson, N. L., and S. Kotz: 1970 and 1972, Distributions in Statistics, Vols. 1 and 4, Wiley, New York, New York, U.S.A.

    Google Scholar 

  • Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lütkepohl, and T.-C. Lee: 1985, The Theory and Practice of Econometrics 2nd edition, John Wiley and Sons, New York, New York, U.S.A.

    Google Scholar 

  • Judge, G. G., R. C. Hill, W. E. Griffiths, H. Lütkepohl, and T.-C. Lee: 1988, Introduction to the Theory and Practice of Econometrics 2nd edition, John Wiley and Sons, New York, New York, U.S.A.

    Google Scholar 

  • Kalbfleisch, J. O., and R. L. Prentice: 1980, Statistical Analysis of Failure Time Data, Wiley, New York, New York, U.S.A.

    Google Scholar 

  • Kmenta, J.: 1986, Elements of Econometrics, 2nd edition, Macmillan Company.

    Google Scholar 

  • Lee, L. F.: 1982, ’some Approaches to the Correction of Selectivity Bias’, Review of Economic Studies, 49, 355–372.

    Article  Google Scholar 

  • Maddala, G. S.: 1983, Limited Dependent and Qualitative Variables in Econometrics, Cambridge University Press, London, England.

    Book  Google Scholar 

  • Mroz, T. A.: 1987, ‘The Sensitivity of an Empirical Model of Married Women’s Hours of Work to Economic and Statistical Assumptions’, Econometrica, 55(4), 765–799.

    Article  Google Scholar 

  • Nakamura, A., and M. Nakamura: 1985a, ‘Dynamic Models of the Labor Force Behavior of Married Women Which Can Be Estimated Using Limited Amounts of Past Information’, Journal of Econometrics, 27, 273–298.

    Article  Google Scholar 

  • Nakamura, A., and M. Nakamura: 1985b, The Second Paycheck: A Socioeconomic Analysis of Earnings, Academic Press, Orlando, Florida, U.S.A.

    Google Scholar 

  • Nelson, F.: 1984, ‘Efficiency of the Two-Step Estimator for Models with Endogenous Sample Selection’, Journal of Econometrics, 24, 181–196.

    Article  Google Scholar 

  • Ofer, G., and A. Vinokur: 1985, ‘Work and Family Roles of Soviet Women: Historical Trends and Cross-Section Analysis’, Journal of Labor Economics, 3, S328–S354.

    Article  Google Scholar 

  • Olsen, R. J.: 1978, ‘Note on the Uniqueness of the Maximum Likelihood Estimator for the Tobit Model’, Econometrica, 46, 1211–1215.

    Article  Google Scholar 

  • Olsen, R. J.: 1980, ‘A Least Squares Correction for Selectivity Bias’, Econometrica, 48, 1815–1820.

    Article  Google Scholar 

  • Paarsch, H. J.: 1984, ‘A Monte Carlo Comparison of Estimators for Censored Regression Models’, Journal of Econometrics, 24, 197–214.

    Article  Google Scholar 

  • Powell, J. L.: 1981, ‘Least Absolute Deviations Estimation for Censored and Truncated Regression Models’, Technical Report No. 356, Institute for Mathematical Studies in the Social Sciences, Stanford University, California, U.S.A.

    Google Scholar 

  • Powell, J. L.: 1983a, ‘Asymptotic Normality of the Censored and Truncated Least Absolute Deviations Estimators’, Technical Report No. 395, Institute for Mathematical Studies in the Social Sciences, Stanford University, California, U.S.A.

    Google Scholar 

  • Powell, J. L.: 1983b, ‘The Asymptotic Normality of Two-Stage Least Absolute Deviations Estimators’, Econometrica, 51, 1569–1575.

    Article  Google Scholar 

  • Powell, J. L.: 1984, ‘Least Absolute Deviations Estimation for the Censored Regression Model’, Journal of Econometrics, 25, 303–325.

    Article  Google Scholar 

  • Reimers, C. W.: 1983, ‘Labor Market Discrimination against Hispanic and Black Men’, Review of Economics and Statistics, 65, 570–579.

    Article  Google Scholar 

  • Robinson, P. M.: 1982, ‘On the Asymptotic Properties of Estimators of Models Containing Limited Dependent Variables’, Econometrica, 50, 27–41.

    Article  Google Scholar 

  • Smith, J. P.: 1980, Female Labor Supply: Theory and Estimation, Princeton University Press, Princeton, New Jersey, U.S.A.

    Google Scholar 

  • Tobin, J.: 1958, ‘Estimation of Relationships for Limited Dependent Variables’, Econometrica, 26, 24–36.

    Article  Google Scholar 

  • Wales, T. J., and A. D. Woodland: 1980, ’sample Selectivity and the Estimation of Labor Supply Functions’, International Economic Review, 21, 437–468.

    Article  Google Scholar 

  • White, H.: 1980, ‘A Heteroskedasticity-consistent Covariance Estimator and a Direct Test for Heteroskedasticity’, Econometrica, 48, 817–838.

    Article  Google Scholar 

  • Wu, D. M.: 1965, ‘An Empirical Analysis of Household Durable Goods Expenditure’, Econometrica, 33, 761–780.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1989 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Nakamura, A., Nakamura, M. (1989). Selection Bias: More than a Female Phenomenon. In: Raj, B. (eds) Advances in Econometrics and Modelling. Advanced Studies in Theoretical and Applied Econometrics, vol 15. Springer, Dordrecht. https://doi.org/10.1007/978-94-015-7819-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-94-015-7819-6_10

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-4048-0

  • Online ISBN: 978-94-015-7819-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics