# Categorical and Limited Dependent Variable Modeling in Higher Education

• Awilda Rodriguez
• Fernando Furquim
• Stephen L. DesJardins
Chapter
Part of the Higher Education: Handbook of Theory and Research book series (HATR, volume 33)

## Abstract

Higher education researchers have applied increasingly sophisticated regression techniques to the study of many important issues. Historically, the statistical workhorse of this work has been linear regression, which has several desirable properties for analyzing continuous outcomes, and under certain assumptions yields unbiased coefficient estimates. However, several outcomes of interest to higher education scholars are categorical or limited in the values they assume, and using linear regression to study them may violate important assumptions. Herein we provide an overview of regression techniques often employed when studying categorical or limited dependent variables. We begin by discussing the modeling of binary outcomes, which are often studied using linear probability, logistic, or probit models. We then consider dependent variables with multiple categories, modeled using ordinal and multinomial regression methods. We also discuss the use of models for other limited dependent variables, including counts, fractions, and censored or truncated outcomes. Throughout the chapter, we apply these techniques to the study of students’ college choice using a relatively new data set available from the National Center for Education Statistics.

## Keywords

Categorical dependent variable modeling Limited dependent variable modeling Linear probability model Logistic regression Probit regression Logit Probit Multinomial logistic regression Multinomial probit regression Count modeling Poisson regression Negative binomial regression Student choice modeling Fractional outcomes Censored dependent variables Truncated dependent variables

## References

1. Addo, F. R., Houle, J. N., & Simon, D. (2016). Young, black, and (still) in the red: Parental wealth, race, and student loan debt. Race and Social Problems, 8(1), 64–76.
2. Allison, P. D. (2002). Missing data: Quantitative applications in the social sciences. Thousand Oaks, CA: Sage.
3. Angrist, J. D., & Pishke, J. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton, NJ: Princeton University Press.Google Scholar
4. Archer, K. J., & Lemeshow, S. (2006). Goodness-of-fit test for a logistic regression model fitted using survey sample data. The Stata Journal, 6(1), 97–105.Google Scholar
5. Arcidiacono, P. (2005). Affirmative action in higher education: How do admission and financial aid rules affect future earnings? Econometrica, 73(5), 1477–1524.
6. Atkins, D. C., & Gallop, R. J. (2007). Rethinking how family researchers model infrequent outcomes: A tutorial on count regression and zero-inflated models. Journal of Family Psychology, 21(4), 726.
7. Austin, J. T., Yaffee, R. A., & Hinkle, D. E. (1992). Logistic regression for research in higher education. In J. C. Smart (Ed.), Higher education: handbook of theory and research, VIII (pp. 379–410). New York: Agathon Press.Google Scholar
8. Bahr, P. R. (2008). Does mathematics remediation work?: A comparative analysis of academic attainment among community college students. Research in Higher Education, 49(5), 420–450.
9. Bastedo, M. N., & Flaster, A. (2014). Conceptual and methodological problems in research on college undermatch. Educational Researcher, 43(2), 93–99.
10. Bastedo, M. N., & Gumport, P. J. (2003). Access to what? Mission differentiation and academic stratification in US public higher education. Higher Education, 46(3), 341–359.
11. Bastedo, M. N., & Jaquette, O. (2011). Running in place: Low-income students and the dynamics of higher education stratification. Educational Evaluation and Policy Analysis, 33(3), 318–339.
12. Baum, C. F. (2008). Stata tip 63: Modeling proportions. Stata Journal, 8(2), 299.Google Scholar
13. Belasco, A. (2013). Creating college opportunity: School counselors and their influence on postsecondary enrollment. Research in Higher Education, 54(7), 781–804.
14. Belasco, A. S., Rosinger, K. O., & Hearn, J. C. (2015). The test-optional movement at America’s selective liberal arts colleges: A boon for equity or something else? Educational Evaluation and Policy Analysis, 37(2), 206–223.
15. Bielby, R., House, E., Flaster, A., & DesJardins, S.L. (2013) Instrumental variables: Conceptual issues and an application considering high school coursetaking. In M. Paulsen (Ed.), Higher education: Handbook of theory and research, XXVIII (pp. 263–321). Dordrecht, The Netherlands: Springer.Google Scholar
16. Bielby, R., Posselt, J. R., Jaquette, O., & Bastedo, M. N. (2014). Why are women underrepresented in elite colleges and universities? A non-linear decomposition analysis. Research in Higher Education, 55(8), 735–760.
17. Blume, G. H. (2016). Application behavior as a consequential juncture in the take-up of postsecondary education. Doctoral dissertation, University of Washington.Google Scholar
18. Borooah, V. K. (2002). Logit and probit: Ordered and multinomial models. Thousand Oaks, CA: Sage.
19. Brasfield, D. W., Harrison, D. E., & McCoy, J. P. (1993). The impact of high school economics on the college principles of economics course. The Journal of Economics Education, 24(2), 99–111.
20. Cabrera, A. F. (1994). Logistic regression analysis in higher education: An applied perspective. In J. C. Smart (Ed.), Higher education: Handbook of theory and research, X (pp. 225–256). Bronx, NY: Agathon Press.Google Scholar
21. Cameron, A. C., & Trivedi, P. K. (1998). Regression analysis of count data. New York: Cambridge University Press.
22. Carnevale, A. P., & Strohl, J. (2013). Separate and unequal: How higher education reinforces the intergenerational reproduction of white racial privilege. Washington, DC: Georgetown University Center on Education and the Workforce.Google Scholar
23. Carnevale, A. P., & Van der Werf, M. (2017). The 20% solution: Selective colleges can afford to admit more Pell grant recipients. Washington, DC: Georgetown University Center on Education and the Workforce.Google Scholar
24. Ceja, M. (2001). Understanding the role of parents and siblings as information sources in the college choice process of Chicana students. Journal of College Student Development, 47(1), 87–104.
25. Cha, K.-W., & Weagley, R. O. (2002). Higher education borrowing. Financial Counseling and Planning, 13, 61–74.Google Scholar
26. Cha, K.-W., Weagley, R. O., & Reynolds, L. (2005). Parental borrowing for dependent children’s higher education. Journal of Family and Economic Issues, 26, 299–321.
27. Chen, X., Ender, P., Mitchell, M. & Wells, C. (2003). Regression with Stata. Retrieved from https://stats.idre.ucla.edu/stata/webbooks/reg/chapter2/stata-webbooksregressionwith-statachapter-2-regression-diagnostics/
28. Cheng, S., & Starks, B. (2002). Racial differences in the effects of significant others on students’ educational expectations. Sociology of Education, 75(4), 306–327.
29. Chung, A. S. (2012). Choice of for-profit college. Economics of Education Review, 31, 1084–1101.
30. Clinedinst, M., Koranteng, A., & Nicola, T. (2015). The state of college admission. Arlington, VA: National Association for College Admission Counseling. Retrieved from: https://indd.adobe.com/view/c555ca95-5bef-44f6-9a9b-6325942ff7cbGoogle Scholar
31. Cragg, J. G. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica, 39(5), 829–844.
32. Cramer, J. S. (2003). The origins and development of the logit model. In J. S. Cramer (Ed.), Logit models from economics and other fields (pp. 149–158). Cambridge, UK: Cambridge University Press.
33. Cribari-Neto, F., & Zeileis, A. (2010). Beta regression in R. Journal of Statistical Software, 34(2), 1–24.
34. DesJardins, S. L. (2002). An analytic strategy to assist institutional recruitment and marketing efforts. Research in Higher Education, 43(5), 531–553.
35. Dey, E. L., & Astin, A. W. (1993). Statistical alternatives for studying college student retention: A comparative analysis of logit, probit, and linear regression. Research in Higher Education, 34(5), 569–581.
36. Doyle, W. (2007). Public opinion, partisan identification, and higher education policy. The Journal of Higher Education, 78(4), 369–401.
37. Dynarski, S. M. (2004). Does aid matter? Measuring the effect of student aid on college attendance and completion. The American Economic Review, 93(1), 279–288.
38. Eagan, K., Lozano, J. B., Hurtado, S., & Case, M. H. (2013). The American freshman: National norms fall 2013. Los Angeles: Higher Education Research Institute, UCLA.Google Scholar
39. Eagan, M. K., Hurtado, S., Chang, M. J., Garcia, G. A., Herrera, F. A., & Garibay, J. C. (2013). Making a difference in science education: The impact of undergraduate research programs. American Educational Research Journal, 50(4), 683–713.
40. Eliason, S. R. (1993). Quantitative applications in the social sciences: Maximum likelihood estimation. Thousand Oaks, CA: SAGE.Google Scholar
41. Engberg, M. E., & Allen, D. J. (2011). Uncontrolled destinies: Improving opportunity for low-income students in American higher education. Research in Higher Education, 52(8), 786–807.
42. Engberg, M. E., & Gilbert, A. J. (2014). The counseling opportunity structure: Examining correlates of four-year college-going rates. Research in Higher Education, 55(3), 219–244.
43. Engberg, M. E., & Wolniak, G. C. (2010). Examining the effects of high school contexts on postsecondary enrollment. Research in Higher Education, 51(2), 132–153.
44. Federal Student Aid, U.S. Department of Education. (2016). Official cohort default rates for schools. Washington, DC: Author. Retrieved from https://www2.ed.gov/offices/OSFAP/defaultmanagement/cdr.htmlGoogle Scholar
45. Ferrari, S., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799–815.
46. Freeman, K., & Thomas, G. (2008). Black colleges and college choice: Characteristics of students who choose HBCUs. The Review of Higher Education, 25(3), 349–358.
47. Furquim, F., & Glasener, K. M. (2016). A quest for equity? Measuring the effect of QuestBridge on economic diversity at selective institutions. Research in Higher Education, 58, 646.
48. Furquim, F., Glasener, K. M., Oster, M., McCall, B. P., & DesJardins, S. L. (2017). Navigating the financial aid process: Borrowing outcomes among first-generation and non-first generation students. The Annals of the American Academy of Political and Social Science, 671(1), 69–91.
49. Goldrick-Rab, S. (2006). Following their every move: An investigation of social-class differences in college pathways. Sociology of Education, 79(1), 67–79.
50. Gonzales, R. G. (2011). Learning to be illegal: Undocumented youth and shifting legal context in the transition to adulthood. American Sociological Review, 76, 602–619.
51. Gonzalez, J. M., & DesJardins, S. L. (2002). Artificial neural networks: A new approach to predicting application behavior. Research in Higher Education, 43(2), 235–258.
52. Greene, W. H. (2002). Econometric analysis (5th ed.). Upper Saddle River, NJ: Prentice Hall.Google Scholar
53. Hahn, E. D., & Soyer, R. (2005). Probit and logit models: Differences in the multivariate realm. The Journal of the Royal Statistical Society, Series B, 1–12.Google Scholar
54. Hart, N. K., & Mustafa, S. (2008). What determines the amount students borrow? Revisiting the crisis–convenience debate. Journal of Student Financial Aid, 38(1), 17–39.Google Scholar
55. Hillman, N. W. (2013). Economic diversity in elite higher education: Do no-loan programs impact Pell enrollments? The Journal of Higher Education, 84(6), 806–833.
56. Hillman, N. W. (2014). College on credit: A multilevel analysis of student loan default. The Review of Higher Education, 37(2), 169–195.
57. Horace, W. C., & Oaxaca, R. L. (2006). Results on the bias and inconsistency of ordinary least squares for the linear probability model. Economics Letters, 90, 90321–90327. Google Scholar
58. Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Hoboken, NJ: Wiley.
59. Howell, J. (2010). Assessing the impact of eliminating affirmative action in higher education. Journal of Labor Economics, 28(1), 113–166.
60. Hurtado, S., Inkelas, K. K., Briggs, C., & Rhee, B. S. (1997). Differences in college access and choice among racial/ethnic groups: Identifying continuing barriers. Research in Higher Education, 38(1), 43–75.
61. Hurwitz, M. (2012). The impact of institutional grant aid on college choice. Educational Evaluation and Policy Analysis, 34(3), 344–363.
62. Ishitani, T. T., & McKitrick, S. A. (2016). Are student loan default rates linked to institutional capacity? Journal of Student Financial Aid, 46(1), 17–37.Google Scholar
63. Kelchen, R., & Li, A. Y. (2017). Institutional accountability: A comparison of the predictors of student loan repayment and default rates. The Annals of the American Academy of Political and Social Science, 671(1), 202–223.
64. Kim, J., DesJardins, S., & McCall, B. (2009). Exploring the effects of student expectations about financial aid on postsecondary choice: A focus on income and racial/ethnic differences. Research in Higher Education, 50(8), 741–774.
65. Kim, J., Kim, J., DesJardins, S. L., & McCall, B. P. (2015). Completing algebra II in high school: Does it increase college access and success? The Journal of Higher Education, 86(4), 628–662.
66. Lin, T. F., & Schmidt, P. (1984). A test of the Tobit specification against an alternative suggested by Cragg. The Review of Economics and Statistics, 66(1), 174–177.
67. Little, R. J., & Rubin, D. B. (2014). Statistical analysis with missing data. Hoboken, NJ: Wiley.Google Scholar
68. Long, J. S. (1997). Regression models for categorical and limited dependent variables. Thousand Oaks, CA: Sage.Google Scholar
69. Long, J. S., & Freese, J. (2014). Regression models for categorical dependent variables using Stata (3rd ed.). College Station, TX: Stata Press.Google Scholar
70. McDonough, P. M. (1994). Buying and selling higher education: The social construction of the college applicant. The Journal of Higher Education, 65(4), 427–446. Google Scholar
71. McDonough, P. M. (1997). Choosing colleges: How social class and schools structure opportunity. Albany: State University of New York Press.Google Scholar
72. Mehta, C. R., & Patel, N. R. (1995). Exact logistic regression: Theory and examples. Statistics in Medicine, 14, 2143–2160.
73. Menard, S. (2002). Applied logistic regression analysis (2nd ed.). Thousand Oaks, CA: Sage.
74. Menard, S. (2010). Logistic regression: From introductory to advanced concepts and applications. Thousand Oaks, CA: Sage.
75. Morrison, E., Rudd, E., Picciano, J., & Nerad, M. (2011). Are you satisfied? PhD education and faculty taste for prestige: Limits of the prestige value system. Research in Higher Education, 52(1), 24–46.
76. Myers, S. M., & Myers, C. B. (2012). Are discussions about college between parents and their high school children a college-planning activity? American Journal of Education, 118(3), 281–308.
77. Niu, S. X., & Tienda, M. (2008). Choosing colleges: Identifying and modeling choice sets. Social Science Research, 37(2), 416–433.
78. Norton, E. C., Wang, H., & Ai, C. (2004). Computing interaction effects and standard errors in logit and probit models. The Stata Journal, 4(2), 154–167.Google Scholar
79. O’Connor, N., Hammack, F. M., & Scott, M. A. (2010). Social capital, financial knowledge, and Hispanic student college choices. Research in Higher Education, 51(3), 195–219.
80. Office for Civil Rights, U.S. Department of Education. (2016). Securing equal opportunity: Report to the president and secretary of education. Washington, DC: Author. Retrieved from: https://www2.ed.gov/about/reports/annual/ocr/report-to-president-and-secretary-of-education-2016.pdfGoogle Scholar
81. Ospina, R., & Ferrari, S. L. (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics & Data Analysis, 56(6), 1609–1623.
82. Palardy, G. J. (2015). High school socioeconomic composition and college choice: Multilevel mediation via organizational habitus, school practices, peer and staff attitudes. School Effectiveness and School Improvement, 26(3), 329–353.
83. Pallais, A. (2015). Small differences that matter: Mistakes in applying to college. Journal of Labor Economics, 33(2), 38.
84. Pampel, F. C. (2000). Logistic regression: A primer (Series Number 07-132). Thousand Oaks, CA: Sage.Google Scholar
85. Papke, L. E., & Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics, 11, 619–632.
86. Peng, C. Y. J., So, T. S. H., Stage, F. K., & St. John, E. P. (2002). The use and interpretation of logistic regression in higher education journals: 1988–1999. Research in Higher Education, 43(3), 259–293.
87. Perna, L. W. (2006). Studying college access and choice: A proposed conceptual model. In J. C. Smart (Ed.), Higher education: Handbook of theory and research, XXI (pp. 99–157). Dordrecht, The Netherlands: Springer.Google Scholar
88. Perna, L. W., & Titus, M. A. (2004). Understanding differences in the choice of college attended: The role of state public policies. The Review of Higher Education, 27(4), 501–525.
89. Perna, L. W., & Titus, M. A. (2005). The relationship between parental involvement as social capital and college enrollment: An examination of racial/ethnic group differences. The Journal of Higher Education, 76(5), 485–518.
90. Porter, S., & Umbach, P. (2006). College major choice: An analysis of person-environment fit. Research in Higher Education, 47(4), 429–449.
91. Posselt, J. R., Jaquette, O., Bielby, R., & Bastedo, M. N. (2012). Access without equity: Longitudinal analyses of institutional stratification by race and ethnicity, 1972–2004. American Educational Research Journal, 49(6), 1074–1111.
92. Pryor, J. H., Hurtado, S., Saenz, V. B., Santos, J. L., & Korn, W. S. (2007). The American freshman: Forty year trends. Los Angeles: Higher Education Research Institute, UCLA. Retrieved from http://heri.ucla.edu/PDFs/40TrendsManuscript.pdfGoogle Scholar
93. Roderick, M., Coca, V., & Nagaoka, J. (2011). Potholes on the road to college: High school effects in shaping urban students’ participation in college application, four-year college enrollment, and college match. Sociology of Education, 84(3), 178–211.
94. Rowan-Kenyon, H. T., Bell, A. D., & Perna, L. W. (2008). Contextual influences on parental involvement in college going: Variations by socioeconomic class. The Journal of Higher Education, 79(5), 564–586.
95. Scott, M., Bailey, T., & Kienzl, G. (2006). Relative success? Determinants of college graduation rates in public and private colleges in the US. Research in Higher Education, 47(3), 249–279.
96. Scott-Clayton, J. (2011). On money and motivation: A quasi-experimental analysis of financial incentives for college achievement. Journal of Human Resources, 46(3), 614–646.
97. Smith, J. (2014). The effect of college applications on enrollment. E. Journal of Economic Analysis & Policy, 14(1), 151–188. Google Scholar
98. Smith, J., Pender, M., & Howell, J. (2013). The full extent of student-college academic undermatch. Economics of Education Review, 32, 247–261.
99. Sribney, W. (n.d.). Why should I not do a likelihood-ratio test after an ML estimation (e.g., logit, probit) with clustering or pweights?. Retrieved from http://www.stata.com/support/faqs/statistics/likelihood-ratio-test/
100. Stratton, L. S., O’Toole, D. M., & Wetzel, J. N. (2007). Are the factors affecting dropout behavior related to initial enrollment intensity for college undergraduates? Research in Higher Education, 48(4), 453–485.
101. Taggart, A., & Crisp, G. (2011). The role of discriminatory experiences on Hispanic students’ college choice decisions. Hispanic Journal of Behavioral Science, 33(1), 22–38.
102. Teranishi, R. T., & Briscoe, K. (2008). Contextualizing race: African American college choice in an evolving affirmative action era. The Journal of Negro Education, 77(1), 15–26.Google Scholar
103. Titus, M. A. (2007). Detecting selection bias, using propensity score matching, and estimating treatment effects: An application to the private returns to a master’s degree. Research in Higher Education, 48(4), 487–521.
104. Wells, R. S., Lynch, C. S., & Siefert, T. A. (2011). Methodological options and their implications: An example using secondary data to analyze Latino educational expectations. Research in Higher Education, 52(7), 693–716.
105. Winship, C., & Mare, R. D. (1984). Regression models with ordinal variables. American Sociological Review, 49(4), 512–525.
106. Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.Google Scholar
107. Wooldridge, J. M. (2008). Introductory econometrics: A modern approach. Ontario, Canada: Nelson Education.Google Scholar

## Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

## Authors and Affiliations

• Awilda Rodriguez
• 1
• Fernando Furquim
• 1
• Stephen L. DesJardins
• 2
1. 1.Center for the Study of Higher and Postsecondary Education, School of EducationUniversity of MichiganAnn ArborUSA
2. 2.Center for the Study of Higher and Postsecondary Education, School of Education, Gerald R. Ford School of Public PolicyUniversity of MichiganAnn ArborUSA