Advertisement

The Real Time Implementation of the Recognition Stage

  • Robert Breitschaedel
  • Alberto Ciaramella
  • Davide Clementino
  • Roberto Pacifici
  • Jean Pierre Riviere
  • Giovanni Venuti
Part of the Research Reports ESPRIT book series (ESPRIT, volume 1)

Abstract

Subtasks 2.2 and 2.3 of the P26 project have been devoted to the design of a hardware architecture and to the implementation on it, in real time, of recognition algorithms already developed and experimented within Subtask 2.1.: this real time implementation of the recognition stage will be called RICO in the following. Table 3.1 summarizes the key points we considered when we started our work, i.e. algorithmic requirements, project development constraints, hardware and software technology limits; they contributed to the definition of RICO main characteristics, summarized in Table 3.2: in the following of this paragraph we will detail these considerations. We started with the consideration that recognition algorithms can be distinguished into two principal blocks, a first “feature extraction” block till vector quantization and phonetic classification of frames, and a following “search” block extracting the lattice of most likely words using dynamic programming: this system “cut” corresponds to the minimal flow of data and besides separates blocks with different computational characteristics. The first block in fact is characterized by predictable execution times, cyclic computations, vector data structures, not-too-large data addressing requirements: this block in fact implements “traditional” DSP algorithms, for which the DSP chips fit well. Instead memory and computational requirements of the second block heavily depend on the recognition vocabulary size and on the speaking style (continuous speech of course is more demanding than isolated words) and also exhibit a time dependency for the same utterance; in each case, for the real time recognition of continuous speech with a 1K words vocabulary, the computational requirements are quite demanding, although were not clearly defined at the beginning of the project.

Keywords

Lexical Access Real Time Implementation Continuous Speech Beam Search Phonetic Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. 1.
    Y. Kawakami, H. Ishizuka, M. Watari, H. Sakoe, T. Hoshi, T. Iwata: “A microprocessor for speech recognition”, IEEE Journal on Selected Areas in Communications, vol. 3, pp. 369–376, March 1985CrossRefGoogle Scholar
  2. 2.
    R.E. Owen: “A VLSI dynamic time warp processor for connected and isolated word speech recognition”, Proc. of the ICASSP’ 85, pp. 985–988, Tampa, Fla., March 1985Google Scholar
  3. 3.
    G. Quenot, J.L. Gauvain, J.J. Gangolf, J. Mariani: “A dynamic time warp VLSI processor for continuous speech recognition”, Proc. of the ICASSP’ 86, pp. 1549–1542, Tokyo, Japan, Apr. 1986Google Scholar
  4. 4.
    J.R. Mann, F.M. Rhodes: “A wafer scale DTW multiprocessor”, Proc. of the ICASSP’ 86, pp. 1557–1560, Tokyo, Japan, Apr. 1986Google Scholar
  5. 5.
    R. A. Kavaler, M. Lowy, H. Murveit, R. R. Brodersen: “A Dynamic Time Warp Integrated Circuit for a 1000 word speech recognition system”, IEEE Journal of Solid State Circuits, vol. 22, pp. 3–14, February 1987CrossRefGoogle Scholar
  6. 6.
    S.G. Glinski, T.M. Lalumia, D.R. Cassiday, Taiho Koh, C. Gerveshi, G. A. Wilson, J. Kumar: “The Graph Search Machine: A VLSI architecture for connected speech recognition and other applications”, IEEE Proceedings, vol. 75, pp. 1172–1184, Sept. 1987CrossRefGoogle Scholar
  7. 7.
    R. Cecinati, A. Ciaramella, G. Venuti, C. Vicenzi: “A dynamic time warping custom integrated circuit for speech recognition”, Proc. of the EUSIPCO’ 86, The Hague, The Netherlands, pp. 1215–1218, Sept. 1986Google Scholar
  8. 8.
    R. Cecinati, A. Ciaramella, L. Licciardi, G. Venuti: “Implementation of a dynamic time warp integrated circuit for large vocabulary isolated and connected speech recognition”, Proc. of EUROSPEECH’ 89, pp. 565–568, Paris, France, Sept. 1989Google Scholar
  9. 9.
    A. Albarello, R. Breitschaedel, A. Ciaramella, E. Lenormand, R. Pacifici, J. Potage, J.P. Riviere, N. Scheibel, G. Venuti: “Implementation of an acoustical front-end using the TMS32020”, Proc. of the Digital Signal Processing Conference, Florence, Italy, September 1987Google Scholar
  10. 10.
    C. Erskine, S. Magar: “Architecture and applications of a second generation digital signal processor”, Proc. of the ICASSP’ 85, pp. 228–231, Tampa, Fla., March 1985Google Scholar
  11. 11.
    K.S. Lin, G.A. Frantz, R. Simar jr.: “The TMS32020 family of digital signal processors”, IEEE Proceedings, vol. 75, pp. 1143–1159, Sept. 1987CrossRefGoogle Scholar
  12. 12.
    D.B. Roe, A.L. Gorin, P. Ramesh: “Incorporating syntax into the level-building algorithm on a tree-structured parallel computer”, Proc. of the ICASSP’ 89, pp. 778–781, Glasgow, UK, May 1989Google Scholar
  13. 13.
    R. Bisiani, T. Anantharaman, L. Butcher: “BEAM: an accelerator for speech recognition”, Proc. of the ICASSP’ 89, pp. 782–784, Glasgow, UK, May 1989Google Scholar
  14. 14.
    S. Chatterjee, P. Agrawal: “Connected speech recognition on multiple processor pipeline”, Proc. of the ICASSP’ 89, pp. 774-777, Glasgow, May 1989Google Scholar
  15. 15.
    W. Fisher: “IEEE P1014 — A standard for high performance VME bus”, IEEE Micro, vol. 5, pp. 31–41, Febr. 1985CrossRefGoogle Scholar
  16. 16.
    D. Gustavson: “Computer buses — A tutorial”, IEEE Micro, vol. 4, pp. 7–22, Aug. 1984CrossRefMathSciNetGoogle Scholar
  17. 17.
    VME Bus Manufacturers Group: VME Bus Specification Manual [with VME Revision B, August 1982, and VMX Revision A, October 1983]Google Scholar
  18. 18.
    P. Harold: “Powerful local buses join the VME bus”, EDN, pp. 199-208, Apr. 18, 1985Google Scholar
  19. 19.
    M. L. Fuccio, R. N. Gadenz, C. J. Garen, J. M. Huser, B. Ng, S. P. Pekarich: “The DSP32C: AT&T’s second generation Floating Point Digital Signal Processor”, IEEE Micro, vol. 8, pp. 30–48, Dec. 1988CrossRefGoogle Scholar
  20. 20.
    P. Papamichalis R. Simar, Jr.: “The TMS320C30 Floating Point Digital Signal Processor”, IEEE Micro, vol. 8, pp. 13–29, Dec. 1988CrossRefGoogle Scholar
  21. 21.
    E. A. Lee: “Programmable DSP architectures: Part I”, IEEE ASSP Magazine, vol. 5, pp. 4–14, Oct. 1988CrossRefGoogle Scholar
  22. 22.
    E. A. Lee: “Programmable DSP architectures: Part II”, IEEE ASSP Magazine, vol.6, pp. 4–14, Jan. 1989CrossRefGoogle Scholar
  23. 23.
    A. Dinning: “A survey of synchronisation methods for parallel computers”, IEEE Computer, vol. 22, pp. 66–77, July 1989CrossRefGoogle Scholar
  24. 24.
    ESPRIT II Project N.2218 (SUNDIAL). Technical AnnexGoogle Scholar
  25. 25.
    D. MacGregor, D. Mothersole, B. Moyer: “The Motorola MC68020”, IEEE Micro, vol. 4, pp. 101–118, Aug. 1984CrossRefGoogle Scholar
  26. 26.
    VERSADOS Operating System — Technical DocumentationGoogle Scholar
  27. 27.
    C. Huntsman D. Cawthron: “The MC68881 floating point coprocessor”, IEEE Micro, vol. 3, pp. 44–54, Dec. 1983CrossRefGoogle Scholar
  28. 28.
    G.W. Cherry: Pascal Programming Structures for Motorola Microprocessors. Reston Publishing, Prentice Hall, 1982Google Scholar
  29. 29.
    M. Ajmone Marsan, G. Balbo, G. Conte: “Performance models of multiprocessor systems”, MIT Press Series in Computer Systems, Chapters 9 and 10, 1986Google Scholar
  30. 30.
    A. Ciaramella, G. Venuti: “Vector quantization firmware for an acoustical front end using the TMS32020”, Proc. of the ICASSP’ 87, pp. 1895–1898, Dallas, Tex., Apr. 1987Google Scholar
  31. 31.
    F.J. Harris: “On the use of windows for harmonic analysis with the Discrete Fourier Transform”, IEEE Proceedings, vol. 66, pp. 51–83, Jan. 1978CrossRefGoogle Scholar
  32. 32.
    E. O. Brigham: The Fast Fourier Transform, Sect. 10-10, pp. 163-171. Prentice Hall, 1974Google Scholar
  33. 33.
    L.R. Morris: “Structural considerations for large FFT programs on the TI TMS32010 DSP microchip”, Proc. of the ICASSP’ 85, pp. 42.13.1-4, Tampa, Fla., March 1985Google Scholar
  34. 34.
    P. Kabal, B. Sayar: “Performance of fixed-point FFT’s: rounding and scaling considerations”, Proc. of the ICASSP’ 86, pp. 6.3.1-4, Tokyo, Japan, Apr. 1986Google Scholar
  35. 35.
    S. Prakash, V.V. Rao: “Fixed point error analysis of Radix-4 FFT”, Signal Processing, vol.3, pp. 123–133, Apr. 1981CrossRefMathSciNetGoogle Scholar
  36. 36.
    K. H. Davis, P. Mermelstein: “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE trans. ASSP, vol.28, pp. 357–366, Aug. 1980CrossRefGoogle Scholar
  37. 37.
    A. Kaltenmeier: “Acoustic/phonetic transcription using a polynomial classifier and Hidden Markov Models” Proc. of the Montreal Symposium on Speech Technology, pp. 95–96, Montreal, Canada, July 1986Google Scholar
  38. 38.
    P. Capello, G. Davidson, A. Gersho, C. Koc, V. Somayazulu: “A systolic vector quantization processor for real time speech coding”, Proc. of the ICASSP’ 86, pp. 41.1.1-4, Tokyo, Japan, Apr. 1986Google Scholar
  39. 39.
    P. Laface, G. Micca, R. Pieraccini: “Experimentals results on a large lexicon access task”, Proc. of the ICASSP’ 87, pp. 809–812, Dallas, Tex., Apr. 1987Google Scholar
  40. 40.
    M. Cravero, R. Pieraccini, F. Raineri: “Definition and evaluation of phonetic units for speech recognition by Hidden Markov Models”, Proc. of the ICASSP’ 86, pp. 42.3.1-4, Tokyo, Japan, Apr. 1986Google Scholar
  41. 41.
    L. Fissore, E. Giachin, P. Laface, G. Micca, R. Pieraccini, C. Rullent: “Experimental results on large vocabulary continuous speech recognition and understanding”, Proc. of the ICASSP’ 88, pp. 414–417, New Jork, NY, Apr. 1988Google Scholar
  42. 42.
    L. Fissore, P. Laface, G. Micca, R. Pieraccini: “Interaction between fast lexical access and word verification in large vocabulary continuous speech recognition” Proc. of the ICASSP’ 88, pp. 279–282, New York, NY, Apr. 1988Google Scholar
  43. 43.
    L. Fissore, P. Laface, G. Micca, R. Pieraccini: “Very large vocabulary isolated utterance recognition: a comparison between one pass and two pass strategies”, Proc. of the ICASSP’ 88, pp. 203–206, New York, NY, Apr. 1988Google Scholar
  44. 44.
    G. Micca, R. Pieraccini, P. Laface, L. Saitta, A. Kaltenmeier: “Word Hypothesization and Verification in a Large Vocabulary”, Proc. of 3rd Esprit Technical Week, pp. 845–853, Brussels, Belgium, Sept. 1986Google Scholar
  45. 45.
    A. Ciaramella, G. Venuti: “Dynamic programming with hidden markov models on a TMS32020 digital signal processor”, Proc. of EUSIPCO’ 88, pp. 751–754, Grenoble, France, Sept. 1988Google Scholar
  46. 46.
    A. Ciaramella, D. Clementino, R. Pacifici: “Characterization of a large vocabulary isolated words and continuous speech recognizer”, Proc. of the Eurospeech’ 89, pp. 437–440, Paris, France, Sept. 1989Google Scholar

Copyright information

© ECSC — EEC — EAEC, Brussels — Luxembourg 1990

Authors and Affiliations

  • Robert Breitschaedel
    • 1
  • Alberto Ciaramella
    • 2
  • Davide Clementino
    • 2
  • Roberto Pacifici
    • 2
  • Jean Pierre Riviere
    • 3
  • Giovanni Venuti
    • 2
  1. 1.Daimler BenzGermany
  2. 2.CSELTItaly
  3. 3.Thomson-CSFFrance

Personalised recommendations