Skip to main content

Multimedia Corpus of In-Car Speech Communication

  • Chapter
Real World Speech Processing

Abstract

An ongoing project for constructing a multimedia corpus of dialogues under the driving condition is reported. More than 500 subjects have been enrolled in this corpus development and more than 2 gigabytes of signals have been collected during approximately 60 minutes of driving per subject. Twelve microphones and three video cameras are installed in a car to obtain audio and video data. In addition, five signals regarding car control and the location of the car provided by the Global Positioning System (GPS) are recorded. All signals are simultaneously recorded directly onto the hard disk of the PCs onboard the specially designed data collection vehicle (DCV). The in-car dialogues are initiated by a human operator, an automatic speech recognition (ASR) system and a wizard of OZ (WOZ) system so as to collect as many speech disfluencies as possible.

In addition to the details of data collection, in this paper, preliminary results on intermedia signal conversion are described as an example of the corpus-based in-car speech signal processing research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J.C. Junqua and J.P. Raton, Robustness in Automatic Speech Recognition. Kluwer Academic Publishers, 1996.

    Google Scholar 

  2. D. Roy, “`Grounded’ Speech Communication,” in Proc. of the International Conference on Spoken Language Processing, ICSLP 2000,Beijin, 2000, pp. IV69–IV72

    Google Scholar 

  3. P. Gelin and J.C. Junqua, “Techniques for Robust Speech Recognition in the Car Environment,” in Proc. of European Conference Speech Communication and Technology, EUROSPEECH’99, Budapest, 1999.

    Google Scholar 

  4. M.J. Hunt, “Some Experiences in In-Car Speech Recognition,” in Proc. of the Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, 1999, pp. 25–31

    Google Scholar 

  5. P. Geutner, L. Arevalo, and J. Breuninger, “VODIS—VoiceOperated Driver Information Systems: A Usability Study on Advanced Speech Technologies for Car Environments,” in Proc. of International Conference on Spoken Language Processing,ICSLP2000, Beijin, 2000, pp. IV378–IV381.

    Google Scholar 

  6. A. Moreno, B. Lindberg, C. Draxler, G. Richard, K. Choukri, J. Allen, and Stephan Eule, “SpeechDat-Car: A Large Speech Database for Automotive Environments,” in Proc. of 2nd Int Conference on Language Resources and Evaluation, Athens, LREC 2000.

    Google Scholar 

  7. N. Kawaguchi, S. Matsubara, H. Iwa, S. Kajita, K. Takeda, E. Itakura, and Y. Inagaki, “Construction of Speech Corpus in Moving Car Environment,” in Proc. of International Conference on Spoken Language Processing, ICSLP2000, Beijin, 2000 pp. 362–365.

    Google Scholar 

  8. T. Kawahara, T. Kobayashi, K. Takeda, N. Minematsu, K. Itou, M. Yamamoto, A. Yamada, T. Utsuro, and K. Shikano, “Japanese Dictation Toolkit: Plug-and-Play Framework for Speech Recognition R and D,” in Proc. of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU’99), 1999 pp. 393–396.

    Google Scholar 

  9. K. Itou, M. Yamamoto, K. Takeda, T. Takezawa, T. Matsuoka, T. Kobayashi, K. Shikano, and S. Itahashi, JNAS: Japanese Speech Corpus for Large Vocabulary Continuous Speech Recognition Research, J. Acoust. Soc. Jpn.(E), vol. 20, no. 3, 1999, pp. 199–206.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kawaguchi, N., Takeda, K., Itakura, F. (2004). Multimedia Corpus of In-Car Speech Communication. In: Wang, JF., Furui, S., Juang, BH. (eds) Real World Speech Processing. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-6363-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-6363-8_7

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5439-8

  • Online ISBN: 978-1-4757-6363-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics