Skip to main content

Part of the book series: The International Series in Engineering and Computer Science ((SECS,volume 563))

  • 163 Accesses

Summary

For both embedded and PC-based systems, the cost of speech recognition has to be negligible in comparison to the total system cost. However, this is much easier to achieve for PCs. The processing power of the average PC has increased to the point that it is now possible to deliver software-only solutions that do not require any additional hardware. In this chapter, we review techniques that are especially useful in embedded applications or to decrease computational complexity such as floating-point to fixed-point conversion and fast Gaussian computation/Gaussian clustering which helps decreasing the amount of time spent in the likelihood computation. Then, we present some case studies for both embedded and PC-based systems. Finally, we focus on some standard Application Programming Interfaces (APIs) that are helping the transition from research prototypes to commercial voice-enabled applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Barnard, E., Halberstadt, A., Kotelly, C., and Phillips, M., (1999). A Consistent Approach to Designing Spoken-Dialog Systems. ASRU Workshop, Keystone, Colorado, U.S.A.

    Google Scholar 

  • Bocchieri, E., (1993). Vector Quantization for Efficient Computation of Continuous Density Likelihoods. ICASSP, pages 692–695.

    Google Scholar 

  • Boman, R., (1997). Fixed Point Implementation of Common Signal Processing Algorithms. ICSPAT, pages 716–720.

    Google Scholar 

  • Boman, R., (1999). Integer Implementation of a Perceptual Based Acoustic Front-End for Robust Speech Recognition in Additive and Convolutional Noise. ICSPAT.

    Google Scholar 

  • Boves, L. and den Os, E., (1999). Applications of Speech Technology: Designing for Usability. ASRU Workshop, Keystone, Colorado, U.S.A.

    Google Scholar 

  • Cole, R., Roginski, K., and Fanty, M., (1992). A Telephone Speech Database of Spelled and Spoken Names. ICSLP, pages 891–893.

    Google Scholar 

  • den Os, E.A. and Bloothooft, 1998. Evaluating Various Spoken Dialogue Systems with a Single Questionnaire: Analysis of the ELSNET Olympics. First Inter. Conf. On Language Resources and Evaluation, Granada, Spain, pages 51–54.

    Google Scholar 

  • Dobler, S., (1999). Speech Control in the Mobile Communications Environment. ASRU Workshop, Keystone, Colorado, U.S.A.

    Google Scholar 

  • Dobrin, C., Boda, P., and Laurila, K., (1999). On Usability of Name Dialing. ASRU Workshop, Keystone, Colorado, U.S.A.

    Google Scholar 

  • D’Orta P., Ferretti M., Scarci S., (1987). Phoneme Classification for Real Time Speech Recognition of Italian. ICASSP, pages 81–84.

    Google Scholar 

  • Elvira, J-M., Torecilla, J-C., and Caminero, J., (1997). Creating User Defined New Vocabularies for Voice Dialing. EUROSPEECH, pages 2463–2466.

    Google Scholar 

  • Fischer, A. and Stahl, V., (1998). Subword Unit Based Speech Recognition in Car Environments. ICASSP, pages 257–260.

    Google Scholar 

  • Gaddy, L., (1999). Embedded Engines Bring Speech to Consumer Appliances. Speech Technology Magazine, December 1999/January 2000, pages 36–39.

    Google Scholar 

  • Hataoka, N., Kokubo, H., Obuchi, Y., and Amano, A., (1998). Development of Robust Speech Recognition Middleware on Microprocessor. ICASSP, pages 837–840.

    Google Scholar 

  • Hunt, M., (1999). Some Experience in In-Car Speech Recognition. COST 249 and IEEE Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, pages 25–31.

    Google Scholar 

  • Hwang, M.Y. and Huang, X., (1993). Shared Distribution Hidden Markov Models for Speech Recognition. IEEE Trans. on SAP, Vol. 1, pages 414–420.

    Google Scholar 

  • Jouvet D., Mauuary L., Monné J., (1991). Automatic Adjustments of the Structure of Markov Models for Speech Recognition Applications. EUROSPEECH, pages 927–930.

    Google Scholar 

  • Kao, Y-K.,, Anderson, W., and Lim, H-S., (1997). A Multi-lingual, Speaker-Independent, Continuous-Speech Recognizer on TMS320C5x Fixed-point DSP. ICSPAT, pages 1639–1643.

    Google Scholar 

  • Kao, Y-K., (1998a). Minimization of Search Network in Speech Recognition. ICSPAT, pages 1344–1348.

    Google Scholar 

  • Kao, Y-K., (1998b). N-Best Search Algorithm for Continuous Speech Recognition. ICSPAT, pages 1349–1353.

    Google Scholar 

  • Knill, K.M., Gales, M.J.F., and Young, S.J., (1996). Use of Gaussian Selection in Large Vocabulary Continuous Speech Recognition Using HMMs. ICSLP, pages 470–473.

    Google Scholar 

  • Labrosse, J. J., (1998). Fixed-point Arithmetic for Embedded Systems. C/C++ Users Journal, pages 21–28.

    Google Scholar 

  • Lim, H-W., (1998). Implementing Speech Recognition Algorithms on the TMS320C2xx Platform. Application Report, Digital Signal Processing Solutions. Texas Instruments.

    Google Scholar 

  • Mak B., Bocchieri E., Barnard E., (1997). Stream Derivation and Clustering Schemes for Subspace Distribution Clustering HMM. IEEE ASRU Workshop, Santa Barbara, U.S.A., pages 339–341.

    Google Scholar 

  • Margulies, E., (1997). Understanding JAVA Telephony. Flatiron Publishing, Inc., New York.

    Google Scholar 

  • Microsoft, 1998. Microsoft Speech API 4.0.

    Google Scholar 

  • Motorola, (1997). Scalable Language API. Version 0.82, May 13th.

    Google Scholar 

  • Muthusamy, Y., Agarwal, R., Gong, Y., and Viswanathan, V., (1999). Speech-Enabled Information Retrieval in the Automobile Environment. ICASSP, pages 2259–2262.

    Google Scholar 

  • Padmanabhan, M., Bahl, L.R., Nahamoo, D., and de Souza, P., (1997). Decision-Tree Quantization of the Feature Space of a Speech Recognizer. EUROSPEECH, pages 147–150.

    Google Scholar 

  • Paul, D.B., (1999). An Investigation of Gaussian Shortlists. ASRU Workshop, Keystone, Colorado, U.S.A.

    Google Scholar 

  • Pouteau, X., Krahmer, E., and Landsbergen, J., (1997). Robust Spoken Dialogue Management for Driver Information Systems. EUROSPEECH, pages 2207–2210.

    Google Scholar 

  • Pouteau, X. and Arévalo, L., (1998). Robust Spoken Dialogue Systems for Consumer Products: A Concrete Application. ICSLP, pages 1231–1234.

    Google Scholar 

  • Ramalingam, C.S., Gong, Y., Netsch, L.P., Anderson, W.W., Godfrey, J.J., Kao, Y-H., (1999). Speaker-Dependent Name Dialing in a Car Environment with Out-of-Vocabulary Rejection. ICASSP, pages 165–168.

    Google Scholar 

  • Simonin, J., Delphin-Poulat, L., and Damnati, G., (1998). Gaussian Density Tree Structure in a Multi-Gaussian HMM-Bused Speech Recognition System. ICSLP, pages 2939–2942.

    Google Scholar 

  • Sun Microsystems, (1998). Javaâ„¢ Speech API Programmer Guide. Version 1.0, October 26

    Google Scholar 

  • Tan, T.T., Gu, Y., and Thomas T., (1999). Word Confusability Measures for Vocabulary Selection in Speech Recognition. ASRU Workshop, Keystone, Colorado, U.S.A.

    Google Scholar 

  • van den Heuvel, H., Bonafonte, A., Boudy, J., Dufour, S., Lockwood, P., Moreno, A., Richard, G., (1999). SpeechDat-Car: Towards a Collection of Speech Databases for Automotive Environments. COST 249 and IEEE Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, pages 135–138.

    Google Scholar 

  • VoiceXML Forum, (1999). Voice Extensible Markup Language, Version 0.9. http://www.voicexml.org.

  • Watanabe, T., Shinoda, K., Takagi, K., and Iso, K.I., (1995). High Speed Speech Recognition Using Tree-Structured Probability Density Function. ICASSP, pages 556–559.

    Google Scholar 

  • Westphal, M. and Waibel, A., (1999). Towards Spontaneous Speech Recognition for On-Board Car Navigation and Inforination Systems. EUROSPEECH, pages 1955–1958.

    Google Scholar 

Download references

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Kluwer Academic Publishers

About this chapter

Cite this chapter

(2002). From Cost Sensitive Embedded Applications to PC-based Systems. In: Robust Speech Recognition in Embedded Systems and PC Applications. The International Series in Engineering and Computer Science, vol 563. Springer, Boston, MA. https://doi.org/10.1007/0-306-47027-6_4

Download citation

  • DOI: https://doi.org/10.1007/0-306-47027-6_4

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-7873-0

  • Online ISBN: 978-0-306-47027-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics