Skip to main content

Annotations and Tools for an Activity Based Spoken Language Corpus

  • Chapter
Current and New Directions in Discourse and Dialogue

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 22))

Abstract

The paper contains a description of the Spoken Language Corpus of Swedish at the Department of Linguistics, Göteborg University (GSLC), and a summary of the various types of analysis and tools that have been developed for work on this corpus. Work on the corpus was started in the late 1970:s. It is incrementally growing and presently consists of 1.3 million words from about 25 different social activities. The corpus was initiated to meet a growing interest in naturalistic spoken language data. It is based on the fact that spoken language varies considerably in different social activities with regard to pronunciation, vocabulary, grammar and communicative functions. The goal of the corpus is to include spoken language from as many social activities as possible to get a more complete understanding of the role of language and communication in human social life. This type of spoken language corpus is still fairly unique even for English, since many spoken language corpora (certainly for Swedish) have been collected for special purposes, like speech recognition, phonetics, dialectal variation or interaction with a computerized dialog system in a very narrow domain, e.g. MapTask (Isard and Carletta 1995), TRAINS (Heeman and Allen 1994), Waxholm (Blomberg et al. 1993).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Jens Allwood (1976) Linguistic Communication as Action and Cooperation. “Gothenburg Monographs in Linguistics” 2. Göteborg University, Department of Linguistics, 257 p.

    Google Scholar 

  • Jens Allwood (1978) On the Analysis of Communicative Action. In “The Structure of Action”, M. Brenner, ed., Basil Blackwell, Oxford, pp. 168–191.

    Google Scholar 

  • Jens Allwood (1993) Feedback in Second Language Acquisition, In “Adult Language Acquisition. Cross Linguistic Perspectives”, Vol. II. C. Perdue, ed., Cambridge: Cambridge University Press, Cambridge, pp. 37–51.

    Google Scholar 

  • Jens Allwood (1994) Obligations and Options in Dialogue, Think, Vol 3, May, ITK, Tilburg University, 9–18.

    Google Scholar 

  • Jens Allwood, ed, (1996 and later editions) Talspråksfrekvenser, Ny och utvidgad upplaga. Gothenburg Papers in Theoretical Linguistics S21. Göteborg University, Department of Linguistics, 418 p.

    Google Scholar 

  • Jens Allwood (1998) Some Frequency based Differences between Spoken and Written Swedish. In Timo Haukioja, ed., Proceedings of the 16th Scandinavian Conference of Linguistics, Turku University, Department of Linguistics, pp. 18–29.

    Google Scholar 

  • Jens Allwood, (2000) An Activity Based Approach to Pragmatics. In “Abduction, Belief and Context in Dialogue; Studies in Computational Pragmatics”, H. Bunt, & B. Black, eds., John Benjamins, Amsterdam, pp. 47–80.

    Google Scholar 

  • Jens Allwood, ed., (2001) Dialog Coding — Function and Grammar: Göteborg Coding Schemas. Gothenburg Papers in Theoretical Linguistics GPTL 85. Göteborg University, Department of Linguistics, 67 p.

    Google Scholar 

  • Jens Allwood and Johan Hagman (1994) Some Simple Measures of Spoken Interaction. In F. Gregersen, & J. Allwood, eds., “Spoken Language, Proceedings of the XIV Conference of Scandinavian Linguistics”, pp. 3–22.

    Google Scholar 

  • Jens Allwood, Elisabeth Ahlsen, Joakim Nivre and Staffan Larsson (2001) Own communication management. In J. Allwood, ed., (2001) Dialog Coding — Function and Grammar: Göteborg Coding Schemas. Gothenburg Papers in Theoretical Linguistics GPTL 85. Göteborg University, Department of Linguistics, pp. 45–52.

    Google Scholar 

  • Jens Allwood, Joakim Nivre and Elisabeth Ahlsén (1990) Speech Management: On the Non-Written Life of Speech. Nordic Journal of Linguistics, 13, 3–48.

    Article  Google Scholar 

  • Mats Blomberg, Rolf Carlson, Kjell Elenius, Björn Granström, Jonatan Gustafson, Sheri Hunnicutt, Roger Lindell and Lennart Neovius (1993) An experimental dialogue system: WAXHOLM, “Proceedings of EUROSPEECH 93”, pp 1867–1870.

    Google Scholar 

  • BNC British National Corpus, Oxford University Computing Services, 13 Banbury Road, Oxford OX2 6NN

    Google Scholar 

  • Mark G. Core and James, F. Allen (1997) Coding Dialogs with the DAMSL Annotation Scheme. In Working Notes of AAAI Fall Symposium on Communicative Action in Humans and Machines, Boston, MA, November 1997.

    Google Scholar 

  • Laila Dybkjær, Niels Ole Bernsen, Hans Dybkjasr, David McKelvie and Andreas Mengel (1998) The MATE Markup Framework. MATE Deliverable D1.2, November 1998, 15 p.

    Google Scholar 

  • Frans Gregersen (1991) The Copenhagen Study in Urban Sociolinguistics, 1+2; Reitzel, Copenhagen.

    Google Scholar 

  • H. Paul Grice (1975). Logic and conversation. In “Syntax and Semantics” Vol. 3: Speech Acts, P. Cole and J. L. Morgan, eds., Seminar Press, New York, pp. 41–58.

    Google Scholar 

  • Leif Grönqvist (1999) Kodningsvisualisering med Framemaker. Göteborg University, Department of Linguistics, 8 p.

    Google Scholar 

  • Leif Grönqvist (2000a) The MultiTool User’s Manual. A tool for browsing and synchronizing transcribed dialogues and corresponding video recordings. Göteborg University, Department of Linguistics, 6 p.

    Google Scholar 

  • Leif Grönqvist (2000b) The TraSA v0.8 Users Manual. A user friendly graphical tool for automatic transcription statistics. Göteborg University, Department of Linguistics, 8 p.

    Google Scholar 

  • E. Hanssen, T. Hoel, E.H. Jahr, O. Rekdal and G. Wiggen (eds.) (1978) Oslomål.

    Google Scholar 

  • Peter A. Heeman and James, F. Allen (1994) The TRAINS 93 Dialogues. TRAINS Technical Note 94-2.

    Google Scholar 

  • Peter Juel Henrichsen (1997) Talesprog med Ansigtsøftning, IAAS, Univ. of Copenhagen, Instrumentalis 10/97 (in Danish), 66 p.

    Google Scholar 

  • Janet Holmes, Bernadette Vine and Gary Johnson (1998) Guide to the Wellington Corpus of Spoken New Zealand English. Victoria University of Wellington, Wellington.

    Google Scholar 

  • Amy Isard and Jean Carletta (1995) Transaction and action coding in the Map Task Corpus. Research Paper HCRORP-65, 27 p.

    Google Scholar 

  • Staffan Larsson (1997) TRACTOR v1.0b1 användarmanual. Göteborg University, Department of Linguistics, 10 p.

    Google Scholar 

  • Christpher D. Manning and Hinrich Schütze (1999) Foundations of Statistical Natural Language Processing, The MIT Press, Boston, Mass., 620p.

    Google Scholar 

  • V. Mantha, J. Hamaker, N. Desmulch, A. Ganapathiraju and J. Picone (1999) Improved Monosyllabic WOrd Modeling on SWITCHBOARD. Mississippi State University, Dept. of Electrical & Computer Engineering.

    Google Scholar 

  • Joakim Nivre (1999a) Transcription Standard. Version 6.2. Göteborg University. Department of Linguistics, 38 p.

    Google Scholar 

  • Joakim Nivre (1999b) Modifierad StandardOrtografi (MSO) Version 6, Göteborg University, Department of Linguistics, 9 p.

    Google Scholar 

  • Joakim Nivre, Kristina Tullgren, Jens Allwood, Elisabeth Ahlsén, Jenny Holm, Leif Grönqvist, Dario Lopez-Kästen and Sylvana Sofkova (1998) Towards multimodal spoken language corpora: TransTool and SyncTool. Proceedings of ACL-COLING 1998, June 1998.

    Google Scholar 

  • Joakim Nivre and Leif Grönqvist (2001) Tagging a corpus of Spoken Swedish. Forthcoming in International Journal of Corpus Linguistics.

    Google Scholar 

  • Ulla Richthoff (2000) En svensk barnspråkskorpus. Uppbyggnad och analyser. Department of Linguistics, Göteborg University.

    Google Scholar 

  • Roeland van Hout and Toni Rietveld (1993) Statistical Techniques for the Study of Language and Language Behaviour. Berlin & New York: Mouton de Gruyter, 400 p.

    Google Scholar 

  • Jan Svartvik (ed.) (1990), The London Corpus of Spoken English: Description and Research. “Lund Studies in English” 82. Lund University Press, 350 p.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Allwood, J., Grönqvist, L., Ahlsén, E., Gunnarsson, M. (2003). Annotations and Tools for an Activity Based Spoken Language Corpus. In: van Kuppevelt, J., Smith, R.W. (eds) Current and New Directions in Discourse and Dialogue. Text, Speech and Language Technology, vol 22. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0019-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0019-2_1

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-1615-8

  • Online ISBN: 978-94-010-0019-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics