Analysis of the Impact of Sample Size, Attribute Variance and Within-Sample Choice Distribution on the Estimation Accuracy of Multinomial Logit Models Using Simulated Data

Zeng, Minhui; Zhong, Ming; Hunt, John Douglas

doi:10.1007/s11518-018-5359-7

Analysis of the Impact of Sample Size, Attribute Variance and Within-Sample Choice Distribution on the Estimation Accuracy of Multinomial Logit Models Using Simulated Data

Published: 06 February 2018

Volume 27, pages 771–789, (2018)
Cite this article

Journal of Systems Science and Systems Engineering Aims and scope Submit manuscript

Minhui Zeng^1,2,3,
Ming Zhong^1,3,4 &
John Douglas Hunt^1,3,5

152 Accesses
1 Citation
Explore all metrics

Abstract

Literature review indicates that sample size, attribute variance and within-sample choice distribution of alternatives are important considerations in the estimation of multinomial logit (MNL) models, but their impacts on the estimation accuracy have not been systematically studied. Therefore, the objective of this paper is to provide an empirical examination to the above issues through a set of simulated discrete choice preference and rank ordered preference datasets. In this paper, the utility coefficients, alternative specific constants (ASCs), and the mean and standard deviation of the four attributes for a set of seven hypothetical alternatives are specified as a priori. Then, synthetic datasets, with varying sample size, attribute variance and within-sample choice distribution are simulated. Based on these datasets, the utility coefficients and ASCs of the specified MNLs are re-estimated and compared with the original values specified as the priori. It is found that (1) the estimation accuracy of utility parameters increases as the sample size increases; (2) the utility coefficients can be re-estimated with reasonable accuracy, but the estimates of the ASCs are confronted with much larger errors; (3) as the variances of the alternative attributes increase, the estimation accuracy improves significantly; and (4) as the distribution of chosen choices becomes more balanced across alternatives within sample datasets, the hit-ratio decreases. The results indicate that (a) under a similar setting presented in this paper, a large sample consisting of a few thousand observations (3000–4000) may be needed in order to provide reasonable estimates for utility coefficients, particularly for ASCs; (b) a larger, but realistic attribute space is preferred in the stated preference survey design; and (c) choice datasets with unbalanced “chosen” choice frequency distribution is preferred, in order to better capture the elasticity between the “perceived utility” associated with alternative’s attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sampling Techniques for Quantitative Research

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

References

Bhat, C.R., & Guo, J. (2004). A mixed spatially correlated logit model: formulation and application to residential choice modeling. Transportation Research Part B: Methodological, 38(2):147–168.
Article Google Scholar
Bhat, C.R., Castro, M. & Khan, M. (2013). A new estimation approach for the multiple discrete-continuous probit (mdcp) choice model. Transportation Research Part B: Methodological, 55:1–22.
Article Google Scholar
Bierlaire, M.C.J. (2006). A theoretical analysis of the cross-nested logit model. Annals of Operations Research, 144(1):287–300.
Article MathSciNet MATH Google Scholar
Bliemer, M.C.J. & Rose, J.M. (2008) Construction of experimental designs for mixed logit models allowing for correlation across choice observations. The 87th Annual TRB Meeting, Washington DC, January 2008, USA.
Google Scholar
Bliemer, M.C.J., Rose, J.M. & Hensher, D.A. (2009) Efficient stated choice experiments for estimating nested logit models. Transportation Research: Part B, 43:19–35.
Article Google Scholar
Brundell-Freij, K. (1997). How good is an estimated logit model? estimation accuracy analyzed by Monte Carlo simulations. Paper presented at the proceedings of seminar F held at European Transport Forum, Brunel University, England, 1–5 September 1997.
Google Scholar
Crabbe, M., Akinc, D. & Vandebroek, M. (2014). Fast algorithms to generate individualized designs for the mixed logit choice model. Transportation Research Part B: Methodological, 60: 1–15.
Article Google Scholar
Cramer, J.S. (1999). Predictive performance of the binary logit model in unbalanced samples. Journal of the Royal Statistical Society: Series D (The Statistician), 48 (1): 85–94.
MathSciNet Google Scholar
Greene, W.H. & Hensher, D.A. (2013). Revealing additional dimensions of preference heterogeneity in a latent class mixed multinomial logit model. Applied Economics, 45 (14): 1897–1902.
Article Google Scholar
Guan, H.Z. (2004). Disaggregated Model-Analysis Tools for Traffic Behavior (in Chinese). China Communications Press, Beijing.
Google Scholar
Hensher, D.A. & Greene, W.H. (2002). Specification and estimation of the nested logit model: alternative normalizations. Transportation Research Part B: Methodological, 36 (1): 1–17.
Article Google Scholar
Hunt, J.D., Zhong, M. & Abraham, J.E. (2007). Examining the accuracy of logit modeling with simulated RP and SP data. Presented at the 2007 World Conference of Transportation Research Conference, Berkeley.
Google Scholar
Koppelman, F.S. & Chu C. (1983). Effect of sample size on disaggregate choice model estimation and prediction. Transportation Research Record: Journal of the Transportation Research Board, 944: 60–69.
Google Scholar
Lemp, J.D., Kockelman, K.M. & Damien, P. (2012). A bivariate multinomial probit model for trip scheduling: Bayesian analysis of the work tour. Transportation Science, 46 (3): 405–424.
Article Google Scholar
Liang, Y.J. & Yuan, Z.Z. (2014). A logit model for selection of passenger facilities at integrated transport hubs. Journal of Trans-port information and safety, 32 (4):36–40.
Google Scholar
McFadden, D. (1978). Modeling the choice of residential location. Transportation Research Record: Journal of the Transportation Research Board, 673: 72–77.
Google Scholar
McFadden, D. (1984) Econometric analysis of qualitative response models. In: Griliches, Z., Intriligator, M.D. (eds.) Handbook of Econometrics II, pp. 1395–1457. Elseviere Science, Amsterdam.
Chapter Google Scholar
Munizaga, M.A. and Alvarez-Daziano, R. (2005). Testing mixed logit and probit models by simulation. Transportation Research Record: Journal of the Transportation Research Board, 1921: 52–62.
Article Google Scholar
Nerella, S. & Bhat, C.R. (2004). Numerical analysis of effect of sampling of alternatives in discrete choice models. Transportation Research Record: Journal of the Transportation Research Board, 1894: 11–19.
Article Google Scholar
Ortuzar, J. & Willumsen, L.G. (2011). Modeling Transport, 4th Edition, John Wiley & Sons, Great Britain.
Book Google Scholar
Rose, J.M. & Bliemer, M. C.J. (2013). Sample size requirements for stated choice experiments. Transportation, 40(5):1021–1041.
Article Google Scholar
Wang, Y.Q., Li, L., Wang, L., Moore, A., Staley, S. & Li, Z.Z. (2014). Modeling traveler mode choice behavior of a new high-speed rail corridor in China. Transportation Planning and Technology, 37(5):466–483.
Article Google Scholar
Wen, C.H., Wang, W.C. & Fu, C. (2012). Latent class nested logit model for analyzing high-speed rail access mode choice. Transportation Research, Part E: Logistics and Transportation Review, 48 (2):545–554.
Article Google Scholar
Ye, F. & Lord, D. (2014). Comparing three commonly used crash severity models on sample size requirements: multinomial logit, ordered probit and mixed logit models. Analytic Methods in Accident Research, 1: 72–85.
Article Google Scholar
Zhang, Y.L., Liang, F.M. & Xie, Y.C. (2007). Crash injury severity analysis using a Bayesian ordered probit model. Presented at 86th Annual Meeting of the Transportation Research Board (No. 07-2335), Washington, D.C..
Google Scholar
Zhong, M., & Hunt, J.D. (2006). Sensitivity analysis of logit formulation and estimation. Presented at the 2006 International Conference on Traffic and Transportation Studies, Xi’an, China.
Google Scholar
Zhou, X., Liu, M., Zhang, D. & Ran, B. (2014). Transfer mode choice of comprehensive passenger transportation terminal based on mixed logit in china. Presented at 93rd Annual Meeting of the Transportation Research Board (No. 14-3968), Washington, D.C..
Google Scholar

Download references

Acknowledgements

The authors appreciate the anonymous referees and the editor for their help to improve the quality of the paper. The funding from Hubei Provincial Natural Science Foundation (2015CFB599) and the funding for Top 1% ESI Academic Program from Wuhan University of Technology supported by “the Fundamental Research Funds for the Central Universities” (WUT:2014-VII-036) is appreciated. This study is also supported by the Natural Science and Engineering Research Council (NSERC), Canada and a start-up grant from Wuhan University of Technology. This paper is also partially supported by a grant from the National Natural Science Foundation of China (NSFC No.51778510).

Author information

Authors and Affiliations

Engineering Research Center for Transportation safety of MOE, Wuhan University of Technology, Wuhan, China
Minhui Zeng, Ming Zhong & John Douglas Hunt
School of Traffic and Transportation Engineering, Changsha University of Science & Technology, Changsha, China
Minhui Zeng
National Engineering Research Center for Water Transportation Safety, Wuhan, China
Minhui Zeng, Ming Zhong & John Douglas Hunt
Department of Civil and Environmental Engineering, University of Waterloo, Ontario, Canada
Ming Zhong
Department of Civil Engineering, University of Calgary, Alberta, Canada
John Douglas Hunt

Authors

Minhui Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Ming Zhong
View author publications
You can also search for this author in PubMed Google Scholar
John Douglas Hunt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ming Zhong.

Additional information

Minhui Zeng, a Ph. D. student at Wuhan University of Technology, and she also works as a lecturer in Changsha University of Science & Technology, China. Her research interests include travel behavior analysis and travel demand modeling, traffic data analysis and mining.

Ming Zhong, a Professor at Intelligent Transportation Systems Research Center, Wuhan University of Technology and he also is an adjunct professor at the University of Waterloo, Canada. He worked as an Associate/Assistant Professor of the Department of Civil Engineering, University of New Brunswick from 2006 to 2013. He obtained his Ph.D. degree in transportation engineering at the University of Regina, Canada in 2004. His research interests include land use transport interaction modeling, travel behavior analysis and travel demand modeling, traffic monitoring program and data analysis, intelligent transportation systems, and remote sensing/GIS applications in transportation.

J.D. Hunt, a professor at the Department of Civil Engineering, University of Calgary, Alberta, Canada. He obtained his Ph.D. degree at Cambridge University in 1986. His research interests include integrated land use transportation modeling (ILUTM), stated response techniques for obtaining data for estimation of model parameters, automobile parking behaviour and parking policy. He is also the primary developer of a popular ILUTM framework - PECAS (Production, Exchange, Consumption Allocation System).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zeng, M., Zhong, M. & Hunt, J.D. Analysis of the Impact of Sample Size, Attribute Variance and Within-Sample Choice Distribution on the Estimation Accuracy of Multinomial Logit Models Using Simulated Data. J. Syst. Sci. Syst. Eng. 27, 771–789 (2018). https://doi.org/10.1007/s11518-018-5359-7

Download citation

Published: 06 February 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s11518-018-5359-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis of the Impact of Sample Size, Attribute Variance and Within-Sample Choice Distribution on the Estimation Accuracy of Multinomial Logit Models Using Simulated Data

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Analysis of the Impact of Sample Size, Attribute Variance and Within-Sample Choice Distribution on the Estimation Accuracy of Multinomial Logit Models Using Simulated Data

Abstract

Access this article

Similar content being viewed by others

Sampling Techniques for Quantitative Research

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation