Abstract
This paper addresses the problem of selecting the ‘optimal’ variable subset in a logistic regression model for a medium-sized data set. As a case study, we take the British English dative alternation, where speakers and writers can choose between two – equally grammatical – syntactic constructions to express the same meaning. With 29 explanatory variables taken from the literature, we build two types of models: one with the verb sense included as a random effect, and one without a random effect. For each type, we build three different models by including all variables and keeping the significant ones, by successively adding the most predictive variable (forward selection), and by successively removing the least predictive variable (backward elimination). Seeing that the six approaches lead to six different variable selections (and thus six different models), we conclude that the selection of the ‘best’ model requires a substantial amount of linguistic expertise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bates, D.: Fitting linear mixed models in R. R News 5(1), 27–30 (2005)
Blackwell, A.: Acquiring the English adjective lexicon: relationships with input properties and adjectival semantic typology. Child Language 32, 535–562 (2005)
Bresnan, J., Cueni, A., Nikitina, T., Baayen, H.: Predicting the Dative Alternation. In: Bouma, G., Kraemer, I., Zwarts, J. (eds.) Cognitive Foundations of Interpretation, pp. 69–94. Royal Netherlands Academy of Science, Amsterdam (2007)
Burnard, L.: Reference Guide for the British National Corpus (XML Edition). Published for the British National Corpus Consortium. Research Technologies Service at Oxford University Computing Services (2007)
Godfrey, J., Holliman, E., McDaniel, J.: Switchboard: Telephone speech corpus for research and development. In: ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 517–520. IEEE Computer Society, Los Alamitos (1992)
Greenbaum, S.: Comparing English Worldwide: The International Corpus of English. Clarendon, Oxford (1996)
Gries, S.: Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics 1, 1–27 (2003)
Gries, S., Stefanowitsch, A.: Extending Collostructional Analysis: A Corpus-based Perspective on ‘Alternations’. International Journal of Corpus Linguistics 9, 97–129 (2004)
Grondelaers, S., Speelman, D.: A variationist account of constituent ordering in presentative sentences in Belgian Dutch. Corpus Linguistics and Linguistic Theory 3(2), 161–193 (2007)
Haspelmath, M.: Ditransitive alignment splits and inverse alignment. Functions of Language 14(1), 79–102 (2007)
Izenman, A.: Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer, New York (2008)
Lapata, M.: Acquiring lexical generalizations from corpora: a case study for diathesis alternations. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pp. 397–404. Morgan Kaufmann, San Francisco (1999)
Pinker, S.: Learnability and Cognition: The Acquisition of Argument Structure. MIT Press, Cambridge (1989)
Rietveld, T., van Hout, R.: Statistical Techniques for the Study of Language and Language Behavior. Mouton de Gruyter, Berlin (1993)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2008)
Sheather, S.: A Modern Approach to Regression with R. Springer, New York (2009)
Siewierska, A., Hollmann, W.: Ditransitive clauses in English with special reference to Lancashire dialect. In: Hannay, M., van der Steen, G.J. (eds.) Structural-functional Studies in English Grammar: In Honor of Lachlan Mackenzie, pp. 83–102. John Benjamins, Amsterdam (2007)
West, B.T., Welch, K.B., Gałecki, A.T.: Linear Mixed Models: A practical guide using statistical software. Chapman & Hall/CRC, Boca Raton (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Theijssen, D. (2010). Variable Selection in Logistic Regression: The British English Dative Alternation. In: Icard, T., Muskens, R. (eds) Interfaces: Explorations in Logic, Language and Computation. ESSLLI ESSLLI 2008 2009. Lecture Notes in Computer Science(), vol 6211. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14729-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-14729-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14728-9
Online ISBN: 978-3-642-14729-6
eBook Packages: Computer ScienceComputer Science (R0)