Skip to main content

Design and acquisition of a task-oriented spontaneous-speech database

  • Part III Communication Issues
  • Chapter
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 745))

Abstract

The need of large databases both for training and testing automatic speech recognition and understanding systems is a well known issue. This paper presents the result of a first collection of task-oriented spontaneous speech corpora performed in the MAIA project, under development at IRST. About 2000 sentences were acquired from 50 subjects concerning two scenarios of human-machine spoken interactions: a telecontrol station for a mobile robot and an information query system. Both systems were simulated by means of the well known “Wizard of Oz” technique. This paper focuses on the methodological issues of this approach, putting in evidence some important points which must be considered in the design of simulations, together with the adopted solutions. A first evaluation of the collected data concludes the exposition.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. R. Bahl, F. Jelinek, and R. L. Mercer. A Maximum Likelihood Approach to Continuous Speech Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2):179–190, March 1983.

    Google Scholar 

  2. J. K. Baker. Trainable Grammars for Speech Recognition. In Proceedings of the Spring Conference of the Acoustical Society of America, 1979.

    Google Scholar 

  3. R. Brunelli, D. Falavigna, T. Poggio, and L. Stringa. Automatic person recognition by using acoustic and geometric feature. Technical report, IRST, Trento, Italy, 1992.

    Google Scholar 

  4. B. Caprile, G. Lazzari, and L. Stringa. Autonomous navigation and speech in the mobile robot of MAIA. In Proc. of the SPIE 92 Conference, Boston, 1992. to appear.

    Google Scholar 

  5. A. Corazza, R. De Mori, R. Gretter, and G. Satta. Computation of Probabilities for a Stochastic Island-Driven Parser. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9):936–950, 1991.

    Google Scholar 

  6. N. M. Fraser and G. N. Gilbert. Simulating speech systems. Computer Speech and Language, 5(1):81–99, January 1991.

    Google Scholar 

  7. A. G. Hauptmann and A. I. Rudnicky. Talking to computers: an empirical investigation. International Journal of Man-Machine Studies, 28:583–684, 1988.

    Google Scholar 

  8. Charles T. Hemphill, John J. Godfrey, and George R. Doddington. The ATIS Spoken Language System Pilot Corpus. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 96–101, Hidden Valley, Penn., June 1990.

    Google Scholar 

  9. F. Jelinek and J. D. Lafferty. Computation of the probability of initial substring generation by stochastic context free grammars. Computational Linguistics, 17(3):315–323, 1991.

    Google Scholar 

  10. L. F. Lamel, R. H. Kassel, and S. Seneff. Speech Database Development: Design and Analysis of the Acoustic-Phonetic Corpus. In Proceedings of the DARPA Speech Recognition Workshop, 1986.

    Google Scholar 

  11. J. Makhoul, F. Jelinek, L. Rabiner, C. Weinstein, and V. Zue. White paper on spoken language systems. In Proceedings of Speech and Natural Language Workshop, pages 463–479, Cape Cod, Ma, USA, 1989.

    Google Scholar 

  12. T. Poggio and L. Stringa. A Project for an Intelligent System: Vision and Learning. International Journal of Quantum Chemistry, 42(727), 1992.

    Google Scholar 

  13. A. I. Rudnicky and M. H. Sakamoto. Transcription Conventions and Evaluation Techniques for Spoken Language System Research. Technical Report 9204-11, School of Computer Science, CMU, Pittsburgh, PA, 1989.

    Google Scholar 

  14. G. Satta and O. Stock. Formal properties and implementation of bidirectional charts. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 1480–1485, Detroit, MI, 1989.

    Google Scholar 

  15. G. Satta and O. Stock. Bi-Directional Context-Free Grammar Parsing for Natural Language Processing. Technical Report 9204–11, IRST, Trento, Italy, 1992. Also as Tech. Rep. IRCS-92-13, University of Pennsylvania, Philadelphia, PA.

    Google Scholar 

  16. O. Stock. A Third Modality of Natural Language? In Proceedings of the European Conference on Artificial Intelligence, pages 853–862, Vienna, Austria, August 1992.

    Google Scholar 

  17. O. Stock, R. Falcone, and P. Insinnamo. Bidirectional Chart: A Potential Technique for Parsing Spoken Natural Language Sentences. Computer Speech and Language, 3(3):219–23, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Vito Roberto

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Corazza, A., Federico, M., Gretter, R., Lazzari, G. (1993). Design and acquisition of a task-oriented spontaneous-speech database. In: Roberto, V. (eds) Intelligent Perceptual Systems. Lecture Notes in Computer Science, vol 745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57379-8_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-57379-8_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57379-1

  • Online ISBN: 978-3-540-48103-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics