Abstract
The paper contains a description of the Spoken Language Corpus of Swedish at the Department of Linguistics, Göteborg University (GSLC), and a summary of the various types of analysis and tools that have been developed for work on this corpus. Work on the corpus was started in the late 1970:s. It is incrementally growing and presently consists of 1.3 million words from about 25 different social activities. The corpus was initiated to meet a growing interest in naturalistic spoken language data. It is based on the fact that spoken language varies considerably in different social activities with regard to pronunciation, vocabulary, grammar and communicative functions. The goal of the corpus is to include spoken language from as many social activities as possible to get a more complete understanding of the role of language and communication in human social life. This type of spoken language corpus is still fairly unique even for English, since many spoken language corpora (certainly for Swedish) have been collected for special purposes, like speech recognition, phonetics, dialectal variation or interaction with a computerized dialog system in a very narrow domain, e.g. MapTask (Isard and Carletta 1995), TRAINS (Heeman and Allen 1994), Waxholm (Blomberg et al. 1993).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jens Allwood (1976) Linguistic Communication as Action and Cooperation. “Gothenburg Monographs in Linguistics” 2. Göteborg University, Department of Linguistics, 257 p.
Jens Allwood (1978) On the Analysis of Communicative Action. In “The Structure of Action”, M. Brenner, ed., Basil Blackwell, Oxford, pp. 168–191.
Jens Allwood (1993) Feedback in Second Language Acquisition, In “Adult Language Acquisition. Cross Linguistic Perspectives”, Vol. II. C. Perdue, ed., Cambridge: Cambridge University Press, Cambridge, pp. 37–51.
Jens Allwood (1994) Obligations and Options in Dialogue, Think, Vol 3, May, ITK, Tilburg University, 9–18.
Jens Allwood, ed, (1996 and later editions) Talspråksfrekvenser, Ny och utvidgad upplaga. Gothenburg Papers in Theoretical Linguistics S21. Göteborg University, Department of Linguistics, 418 p.
Jens Allwood (1998) Some Frequency based Differences between Spoken and Written Swedish. In Timo Haukioja, ed., Proceedings of the 16th Scandinavian Conference of Linguistics, Turku University, Department of Linguistics, pp. 18–29.
Jens Allwood, (2000) An Activity Based Approach to Pragmatics. In “Abduction, Belief and Context in Dialogue; Studies in Computational Pragmatics”, H. Bunt, & B. Black, eds., John Benjamins, Amsterdam, pp. 47–80.
Jens Allwood, ed., (2001) Dialog Coding — Function and Grammar: Göteborg Coding Schemas. Gothenburg Papers in Theoretical Linguistics GPTL 85. Göteborg University, Department of Linguistics, 67 p.
Jens Allwood and Johan Hagman (1994) Some Simple Measures of Spoken Interaction. In F. Gregersen, & J. Allwood, eds., “Spoken Language, Proceedings of the XIV Conference of Scandinavian Linguistics”, pp. 3–22.
Jens Allwood, Elisabeth Ahlsen, Joakim Nivre and Staffan Larsson (2001) Own communication management. In J. Allwood, ed., (2001) Dialog Coding — Function and Grammar: Göteborg Coding Schemas. Gothenburg Papers in Theoretical Linguistics GPTL 85. Göteborg University, Department of Linguistics, pp. 45–52.
Jens Allwood, Joakim Nivre and Elisabeth Ahlsén (1990) Speech Management: On the Non-Written Life of Speech. Nordic Journal of Linguistics, 13, 3–48.
Mats Blomberg, Rolf Carlson, Kjell Elenius, Björn Granström, Jonatan Gustafson, Sheri Hunnicutt, Roger Lindell and Lennart Neovius (1993) An experimental dialogue system: WAXHOLM, “Proceedings of EUROSPEECH 93”, pp 1867–1870.
BNC British National Corpus, Oxford University Computing Services, 13 Banbury Road, Oxford OX2 6NN
Mark G. Core and James, F. Allen (1997) Coding Dialogs with the DAMSL Annotation Scheme. In Working Notes of AAAI Fall Symposium on Communicative Action in Humans and Machines, Boston, MA, November 1997.
Laila Dybkjær, Niels Ole Bernsen, Hans Dybkjasr, David McKelvie and Andreas Mengel (1998) The MATE Markup Framework. MATE Deliverable D1.2, November 1998, 15 p.
Frans Gregersen (1991) The Copenhagen Study in Urban Sociolinguistics, 1+2; Reitzel, Copenhagen.
H. Paul Grice (1975). Logic and conversation. In “Syntax and Semantics” Vol. 3: Speech Acts, P. Cole and J. L. Morgan, eds., Seminar Press, New York, pp. 41–58.
Leif Grönqvist (1999) Kodningsvisualisering med Framemaker. Göteborg University, Department of Linguistics, 8 p.
Leif Grönqvist (2000a) The MultiTool User’s Manual. A tool for browsing and synchronizing transcribed dialogues and corresponding video recordings. Göteborg University, Department of Linguistics, 6 p.
Leif Grönqvist (2000b) The TraSA v0.8 Users Manual. A user friendly graphical tool for automatic transcription statistics. Göteborg University, Department of Linguistics, 8 p.
E. Hanssen, T. Hoel, E.H. Jahr, O. Rekdal and G. Wiggen (eds.) (1978) Oslomål.
Peter A. Heeman and James, F. Allen (1994) The TRAINS 93 Dialogues. TRAINS Technical Note 94-2.
Peter Juel Henrichsen (1997) Talesprog med Ansigtsøftning, IAAS, Univ. of Copenhagen, Instrumentalis 10/97 (in Danish), 66 p.
Janet Holmes, Bernadette Vine and Gary Johnson (1998) Guide to the Wellington Corpus of Spoken New Zealand English. Victoria University of Wellington, Wellington.
Amy Isard and Jean Carletta (1995) Transaction and action coding in the Map Task Corpus. Research Paper HCRORP-65, 27 p.
Staffan Larsson (1997) TRACTOR v1.0b1 användarmanual. Göteborg University, Department of Linguistics, 10 p.
Christpher D. Manning and Hinrich Schütze (1999) Foundations of Statistical Natural Language Processing, The MIT Press, Boston, Mass., 620p.
V. Mantha, J. Hamaker, N. Desmulch, A. Ganapathiraju and J. Picone (1999) Improved Monosyllabic WOrd Modeling on SWITCHBOARD. Mississippi State University, Dept. of Electrical & Computer Engineering.
Joakim Nivre (1999a) Transcription Standard. Version 6.2. Göteborg University. Department of Linguistics, 38 p.
Joakim Nivre (1999b) Modifierad StandardOrtografi (MSO) Version 6, Göteborg University, Department of Linguistics, 9 p.
Joakim Nivre, Kristina Tullgren, Jens Allwood, Elisabeth Ahlsén, Jenny Holm, Leif Grönqvist, Dario Lopez-Kästen and Sylvana Sofkova (1998) Towards multimodal spoken language corpora: TransTool and SyncTool. Proceedings of ACL-COLING 1998, June 1998.
Joakim Nivre and Leif Grönqvist (2001) Tagging a corpus of Spoken Swedish. Forthcoming in International Journal of Corpus Linguistics.
Ulla Richthoff (2000) En svensk barnspråkskorpus. Uppbyggnad och analyser. Department of Linguistics, Göteborg University.
Roeland van Hout and Toni Rietveld (1993) Statistical Techniques for the Study of Language and Language Behaviour. Berlin & New York: Mouton de Gruyter, 400 p.
Jan Svartvik (ed.) (1990), The London Corpus of Spoken English: Description and Research. “Lund Studies in English” 82. Lund University Press, 350 p.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Allwood, J., Grönqvist, L., Ahlsén, E., Gunnarsson, M. (2003). Annotations and Tools for an Activity Based Spoken Language Corpus. In: van Kuppevelt, J., Smith, R.W. (eds) Current and New Directions in Discourse and Dialogue. Text, Speech and Language Technology, vol 22. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0019-2_1
Download citation
DOI: https://doi.org/10.1007/978-94-010-0019-2_1
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-1615-8
Online ISBN: 978-94-010-0019-2
eBook Packages: Springer Book Archive