Experimenting Text Creation by Natural-Language, Large-Vocabulary Speech Recognition
In the last years the probabilistic approach to speech recognition has allowed the development of high-performances large-vocabulary speech recognition systems  . At the IBM Rome Scientific Center a speech-recognition prototype for the Italian language, based on this approach, has been built. The prototype is able to recognize in real time natural-language sentences built using a vocabulary containing up to 20000 words. . Once and for all the user has to perform an acoustic training phase (about 20 minutes long), during which he is required to utter a predefined text. Words must be uttered inserting small pauses (a few centiseconds), between them. The prototype architecture is based on a personal computer equipped with special hardware. The first system we developed was aimed at a business and finance lexicon. Many laboratory tests have shown the effectiveness of the prototype as a tool to create texts by voice. After a first phase during which in-house experiments were carried on , the need arose to test the system in real work enviroments and for different applications. Two applications were considered: the dictation of radiological reports and of insurance company documents. Due to their characteristics, these applications seemed to be very well suited for our purposes. Since the vocabulary of the recognizer must be predefined, we had to adapt the system to the lexicon required by the new applications. The paper describes the techniques developed to efficiently adapt the basic component of the recognizer the acoustic and language models. The results obtained experimenting automatic text dictation during real work are also presented.
KeywordsBeach Acoustics Diphones
Unable to display preview. Download preview PDF.
- L.R. Bahl, F. Jelinek, R.L. Mercer,A Maximum Likelihood Approach to Continuous Speech Recognition, IEEE Trans, on Pattern Analysis and Machine Intelligence, vol. PAMI-5, no. 2, 1983, pp. 179–190.Google Scholar
- F. Jelinek, The development of an experimental discrete dictation recognizer, Proceedings IEEE, vol. 73, no. 11, November 1985, pp. 1616–1624.Google Scholar
- L.R. Bahl, P.F. Brown, P.V. De Souza, R.L. Mercer, M.A. Picheny, Acoustic Markov Models Used in the Tangora Speech Recognition System, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing. Google Scholar
- P. Alto, M. Brandctti, M. Ferretti, G. Maltese, S. Scarci, Experimenting Natural-Language Dictation with a 20000-Word Speech Recognizer, IEEE CompEuro 89, Hamburg, May 8–12, 1989, pp. 2–78–2–81.Google Scholar
- R. Carlson, B. Granstroem, A Text-to-Speech System Based Entirely on Rules, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, PA, April 1976.Google Scholar
- D. H. Klatt, Structure of a Phonological Rule Component for Synthesis-by-Rule Program IEEE Trans, on Acoust., Speech and Sig. Proc, vol. ASSP-24, no. 5, 1976, pp. 391–398.Google Scholar
- T. J. Sejnovski, C. R. Rosenberg, Parallel Networks that Learn to Pronounce English Text, Complex Systems, 1 (1987), pp. 145–168.Google Scholar
- F. Jelinek, R.L. Mercer, Interpolated Estimation of Markov Source Parameters from Sparse Data, in “Pattern Recognition in Practice”, E.L. Gelsema and L.N. Kanal, Ed., North-Holland, New York, 1980, pp. 381–387.Google Scholar
- M. Ferretti, G. Maltese, S. Scarci, Measures of Language Model and Acoustic Information in Probabilistic Speech Recognition, 89 Eurospeech, Paris, September 1989, pp. 473–476.Google Scholar
- F. Jelinek, R.L. Mercer, L.R. Bahl, J.K. Baker, Perplexity — a Measure of Difficulty of Speech Recognition Tasks, 94th Meeting Acoustical Society of America, Miami Beach, December 1977.Google Scholar