Design and acquisition of a task-oriented spontaneous-speech database

Corazza, A.; Federico, M.; Gretter, R.; Lazzari, G.

doi:10.1007/3-540-57379-8_12

Design and acquisition of a task-oriented spontaneous-speech database

A. Corazza¹,
M. Federico¹,
R. Gretter¹ &
…
G. Lazzari¹

Part III Communication Issues
Chapter
First Online: 01 January 2005

112 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 745))

Abstract

The need of large databases both for training and testing automatic speech recognition and understanding systems is a well known issue. This paper presents the result of a first collection of task-oriented spontaneous speech corpora performed in the MAIA project, under development at IRST. About 2000 sentences were acquired from 50 subjects concerning two scenarios of human-machine spoken interactions: a telecontrol station for a mobile robot and an information query system. Both systems were simulated by means of the well known “Wizard of Oz” technique. This paper focuses on the methodological issues of this approach, putting in evidence some important points which must be considered in the design of simulations, together with the adopted solutions. A first evaluation of the collected data concludes the exposition.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

L. R. Bahl, F. Jelinek, and R. L. Mercer. A Maximum Likelihood Approach to Continuous Speech Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2):179–190, March 1983.
Google Scholar
J. K. Baker. Trainable Grammars for Speech Recognition. In Proceedings of the Spring Conference of the Acoustical Society of America, 1979.
Google Scholar
R. Brunelli, D. Falavigna, T. Poggio, and L. Stringa. Automatic person recognition by using acoustic and geometric feature. Technical report, IRST, Trento, Italy, 1992.
Google Scholar
B. Caprile, G. Lazzari, and L. Stringa. Autonomous navigation and speech in the mobile robot of MAIA. In Proc. of the SPIE 92 Conference, Boston, 1992. to appear.
Google Scholar
A. Corazza, R. De Mori, R. Gretter, and G. Satta. Computation of Probabilities for a Stochastic Island-Driven Parser. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(9):936–950, 1991.
Google Scholar
N. M. Fraser and G. N. Gilbert. Simulating speech systems. Computer Speech and Language, 5(1):81–99, January 1991.
Google Scholar
A. G. Hauptmann and A. I. Rudnicky. Talking to computers: an empirical investigation. International Journal of Man-Machine Studies, 28:583–684, 1988.
Google Scholar
Charles T. Hemphill, John J. Godfrey, and George R. Doddington. The ATIS Spoken Language System Pilot Corpus. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 96–101, Hidden Valley, Penn., June 1990.
Google Scholar
F. Jelinek and J. D. Lafferty. Computation of the probability of initial substring generation by stochastic context free grammars. Computational Linguistics, 17(3):315–323, 1991.
Google Scholar
L. F. Lamel, R. H. Kassel, and S. Seneff. Speech Database Development: Design and Analysis of the Acoustic-Phonetic Corpus. In Proceedings of the DARPA Speech Recognition Workshop, 1986.
Google Scholar
J. Makhoul, F. Jelinek, L. Rabiner, C. Weinstein, and V. Zue. White paper on spoken language systems. In Proceedings of Speech and Natural Language Workshop, pages 463–479, Cape Cod, Ma, USA, 1989.
Google Scholar
T. Poggio and L. Stringa. A Project for an Intelligent System: Vision and Learning. International Journal of Quantum Chemistry, 42(727), 1992.
Google Scholar
A. I. Rudnicky and M. H. Sakamoto. Transcription Conventions and Evaluation Techniques for Spoken Language System Research. Technical Report 9204-11, School of Computer Science, CMU, Pittsburgh, PA, 1989.
Google Scholar
G. Satta and O. Stock. Formal properties and implementation of bidirectional charts. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 1480–1485, Detroit, MI, 1989.
Google Scholar
G. Satta and O. Stock. Bi-Directional Context-Free Grammar Parsing for Natural Language Processing. Technical Report 9204–11, IRST, Trento, Italy, 1992. Also as Tech. Rep. IRCS-92-13, University of Pennsylvania, Philadelphia, PA.
Google Scholar
O. Stock. A Third Modality of Natural Language? In Proceedings of the European Conference on Artificial Intelligence, pages 853–862, Vienna, Austria, August 1992.
Google Scholar
O. Stock, R. Falcone, and P. Insinnamo. Bidirectional Chart: A Potential Technique for Parsing Spoken Natural Language Sentences. Computer Speech and Language, 3(3):219–23, 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

IRST - Istituto per la Ricerca Scientifica e Tecnologica, 38050, Povo, Trento, Italy
A. Corazza, M. Federico, R. Gretter & G. Lazzari

Authors

A. Corazza
View author publications
You can also search for this author in PubMed Google Scholar
M. Federico
View author publications
You can also search for this author in PubMed Google Scholar
R. Gretter
View author publications
You can also search for this author in PubMed Google Scholar
G. Lazzari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Vito Roberto

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Corazza, A., Federico, M., Gretter, R., Lazzari, G. (1993). Design and acquisition of a task-oriented spontaneous-speech database. In: Roberto, V. (eds) Intelligent Perceptual Systems. Lecture Notes in Computer Science, vol 745. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57379-8_12

Download citation

DOI: https://doi.org/10.1007/3-540-57379-8_12
Published: 30 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57379-1
Online ISBN: 978-3-540-48103-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics