Autocorrelated Errors in Experimental Data in the Language Sciences: Some Solutions Offered by Generalized Additive Mixed Models

Baayen, R. Harald; van Rij, Jacolien; de Cat, Cecile; Wood, Simon

doi:10.1007/978-3-319-69830-4_4

R. Harald Baayen^9,10,
Jacolien van Rij¹¹,
Cecile de Cat¹² &
…
Simon Wood¹³

Part of the book series: Quantitative Methods in the Humanities and Social Sciences ((QMHSS))

2323 Accesses
43 Citations

Abstract

A problem that tends to be ignored in the statistical analysis of experimental data in the language sciences is that responses often constitute time series, which raises the problem of autocorrelated errors. If the errors indeed show autocorrelational structure, evaluation of the significance of predictors in the model becomes problematic due to potential anti-conservatism of p-values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The parametric coefficients suggest that regularity is irrelevant as predictor of naming times, that singulars are named faster than plurals, that words with voiced initial segments have longer naming times, as do words with a large number of words at Hamming distance 1 at the initial segment. Words with a greater Shannon entropy calculated over the probability distribution of their inflectional variants elicited shorter response times. A thin plate regression spline for log-transformed word frequency suggests a roughly U-shaped effect (not shown) for this predictor.
2.
For this to work properly, it is necessary to use treatment contrasts for ordinal factors, in R: options(contrasts = c("contr.treatment", "contr.treatment")).
3.
The details of the coefficients in the present model differ from those obtained in the analysis of Baayen [1]. Thanks to the factor smooths for subject and compound and the inclusion of a thin plate regression spline for word frequency, the present model provides a better fit (aic 177077.4 versus 187308), suggesting the present reanalysis may provide a more accurate window on sex-specific realizations of compounds’ pitch.
4.
Data points with an absolute amplitude exceeding 15 μV, approximately 2.6% of the data points, were removed to obtain an approximately Gaussian response variable.

References

Baayen RH (2013) Multivariate statistics. In: Podesva R, Sharma D (eds) Research methods in linguistics. Cambridge University Press, Cambridge, pp 337–372
Google Scholar
Baayen RH, Milin P (2010) Analyzing reaction times. Int J Psychol Res 3:12–28
Article Google Scholar
Baayen R, Vasishth S, Bates D, Kliegl R (2015) Out of the cage of shadows. arxiv.org. http://arxiv.org/abs/1511.03120
Bates D, Mächler M, Bolker B, Walker S (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67(1):1–48
Article Google Scholar
Broadbent D (1971) Decision and stress. Academic Press, New York
Google Scholar
DeCat C, Baayen RH, Klepousniotou E (2014) Electrophysiological correlates of noun-noun compound processing by non-native speakers of English. In: Proceedings of the first workshop on computational approaches to compound analysis (ComAComA 2014). Association for Computational Linguistics and Dublin City University, Dublin, Ireland, pp 41–52
Chapter Google Scholar
DeCat C, Klepousniotou E, Baayen RH (2015) Representational deficit or processing effect? A neuro-psychological study of noun-noun compound processing by very advanced l2 speakers of English. Front Psychol (Lang Sci) 6:77
Google Scholar
De Vaan L, Schreuder R, Baayen RH (2007) Regular morphologically complex neologisms leave detectable traces in the mental lexicon. Ment Lexicon 2:1–23
Article Google Scholar
Koesling K, Kunter G, Baayen RH, Plag I (2012) Prominence in triconstituent compounds: pitch contours and linguistic theory. Lang Speech 56(4):529–554
Article Google Scholar
Lin X, Zhang D (1999) Inference in generalized additive mixed models using smoothing splines. J R Stat Soc Ser B 61:381–400
Article MathSciNet MATH Google Scholar
Paeschke A, Kienast M, Sendlmeier W (1999) F0-contours in emotional speech. In: Proceedings of the 14th International Congress of Phonetic Sciences, vol 2, pp 929–932
Google Scholar
Sanders A (1998) Elements of human performance: reaction processes and attention in human skill. Lawrence Erlbaum, Mahwah, NJ
Google Scholar
Tabak W (2010) Semantics and (ir)regular inflection in morphological processing. PhD thesis, University of Nijmegen. Ponsen & Looijen, Ede
Google Scholar
Taylor TE, Lupker SJ (2001) Sequential effects in naming: a time-criterion account. J Exp Psychol Learn Mem Cogn 27:117–138.
Article Google Scholar
Traunmüller H, Eriksson A (1995) The frequency range of the voice fundamental in the speech of male and female adults. Institutionen för lingvistik, Stockholms Universitet, S-106 91 Stockholm, Sweden
Google Scholar
Trouvain J, Barry WJ (2000) The prosody of excitement in horse race commentaries. In: ISCA tutorial and research workshop (ITRW) on speech and emotion
Google Scholar
Welford A (1980) Choice reaction time: basic concepts. In: Welford A (ed) Reaction times. Accademic Press, New York, pp 73–128
Google Scholar
Wilkinson G, Rogers C (1973) Symbolic description of factorial models for analysis of variance. Appl Stat 22:392–399
Article Google Scholar
Wood SN (2006) Generalized additive models. Chapman & Hall/CRC, New York
MATH Google Scholar
Wood SN (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J R Stat Soc (B) 73:3–36
Article MathSciNet Google Scholar
Wood SN (2013) On p-values for smooth components of an extended generalized additive model. Biometrika 100:221–228
Article MathSciNet MATH Google Scholar
Wood SN (2013) A simple test for random effects in regression models. Biometrika 100:1005–1010
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Quantitative Linguistics, University of Tübingen, Tübingen, Germany
R. Harald Baayen
Department of Linguistics, University of Alberta, Edmonton, AB, Canada
R. Harald Baayen
University of Groningen, Groningen, The Netherlands
Jacolien van Rij
University of Leeds, Leeds, UK
Cecile de Cat
University of Bristol, Bristol, UK
Simon Wood

Authors

R. Harald Baayen
View author publications
You can also search for this author in PubMed Google Scholar
Jacolien van Rij
View author publications
You can also search for this author in PubMed Google Scholar
Cecile de Cat
View author publications
You can also search for this author in PubMed Google Scholar
Simon Wood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to R. Harald Baayen .

Editor information

Editors and Affiliations

Faculty of Arts, Research Group QLVL, KU Leuven, Belgium
Dirk Speelman
Faculty of Arts, Research Group QLVL, KU Leuven, Belgium
Kris Heylen
Faculty of Arts, Research Group QLVL, KU Leuven, Belgium
Dirk Geeraerts

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baayen, R.H., van Rij, J., de Cat, C., Wood, S. (2018). Autocorrelated Errors in Experimental Data in the Language Sciences: Some Solutions Offered by Generalized Additive Mixed Models. In: Speelman, D., Heylen, K., Geeraerts, D. (eds) Mixed-Effects Regression Models in Linguistics. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-69830-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-69830-4_4
Published: 08 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69828-1
Online ISBN: 978-3-319-69830-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics