Summary
For both embedded and PC-based systems, the cost of speech recognition has to be negligible in comparison to the total system cost. However, this is much easier to achieve for PCs. The processing power of the average PC has increased to the point that it is now possible to deliver software-only solutions that do not require any additional hardware. In this chapter, we review techniques that are especially useful in embedded applications or to decrease computational complexity such as floating-point to fixed-point conversion and fast Gaussian computation/Gaussian clustering which helps decreasing the amount of time spent in the likelihood computation. Then, we present some case studies for both embedded and PC-based systems. Finally, we focus on some standard Application Programming Interfaces (APIs) that are helping the transition from research prototypes to commercial voice-enabled applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barnard, E., Halberstadt, A., Kotelly, C., and Phillips, M., (1999). A Consistent Approach to Designing Spoken-Dialog Systems. ASRU Workshop, Keystone, Colorado, U.S.A.
Bocchieri, E., (1993). Vector Quantization for Efficient Computation of Continuous Density Likelihoods. ICASSP, pages 692–695.
Boman, R., (1997). Fixed Point Implementation of Common Signal Processing Algorithms. ICSPAT, pages 716–720.
Boman, R., (1999). Integer Implementation of a Perceptual Based Acoustic Front-End for Robust Speech Recognition in Additive and Convolutional Noise. ICSPAT.
Boves, L. and den Os, E., (1999). Applications of Speech Technology: Designing for Usability. ASRU Workshop, Keystone, Colorado, U.S.A.
Cole, R., Roginski, K., and Fanty, M., (1992). A Telephone Speech Database of Spelled and Spoken Names. ICSLP, pages 891–893.
den Os, E.A. and Bloothooft, 1998. Evaluating Various Spoken Dialogue Systems with a Single Questionnaire: Analysis of the ELSNET Olympics. First Inter. Conf. On Language Resources and Evaluation, Granada, Spain, pages 51–54.
Dobler, S., (1999). Speech Control in the Mobile Communications Environment. ASRU Workshop, Keystone, Colorado, U.S.A.
Dobrin, C., Boda, P., and Laurila, K., (1999). On Usability of Name Dialing. ASRU Workshop, Keystone, Colorado, U.S.A.
D’Orta P., Ferretti M., Scarci S., (1987). Phoneme Classification for Real Time Speech Recognition of Italian. ICASSP, pages 81–84.
Elvira, J-M., Torecilla, J-C., and Caminero, J., (1997). Creating User Defined New Vocabularies for Voice Dialing. EUROSPEECH, pages 2463–2466.
Fischer, A. and Stahl, V., (1998). Subword Unit Based Speech Recognition in Car Environments. ICASSP, pages 257–260.
Gaddy, L., (1999). Embedded Engines Bring Speech to Consumer Appliances. Speech Technology Magazine, December 1999/January 2000, pages 36–39.
Hataoka, N., Kokubo, H., Obuchi, Y., and Amano, A., (1998). Development of Robust Speech Recognition Middleware on Microprocessor. ICASSP, pages 837–840.
Hunt, M., (1999). Some Experience in In-Car Speech Recognition. COST 249 and IEEE Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, pages 25–31.
Hwang, M.Y. and Huang, X., (1993). Shared Distribution Hidden Markov Models for Speech Recognition. IEEE Trans. on SAP, Vol. 1, pages 414–420.
Jouvet D., Mauuary L., Monné J., (1991). Automatic Adjustments of the Structure of Markov Models for Speech Recognition Applications. EUROSPEECH, pages 927–930.
Kao, Y-K.,, Anderson, W., and Lim, H-S., (1997). A Multi-lingual, Speaker-Independent, Continuous-Speech Recognizer on TMS320C5x Fixed-point DSP. ICSPAT, pages 1639–1643.
Kao, Y-K., (1998a). Minimization of Search Network in Speech Recognition. ICSPAT, pages 1344–1348.
Kao, Y-K., (1998b). N-Best Search Algorithm for Continuous Speech Recognition. ICSPAT, pages 1349–1353.
Knill, K.M., Gales, M.J.F., and Young, S.J., (1996). Use of Gaussian Selection in Large Vocabulary Continuous Speech Recognition Using HMMs. ICSLP, pages 470–473.
Labrosse, J. J., (1998). Fixed-point Arithmetic for Embedded Systems. C/C++ Users Journal, pages 21–28.
Lim, H-W., (1998). Implementing Speech Recognition Algorithms on the TMS320C2xx Platform. Application Report, Digital Signal Processing Solutions. Texas Instruments.
Mak B., Bocchieri E., Barnard E., (1997). Stream Derivation and Clustering Schemes for Subspace Distribution Clustering HMM. IEEE ASRU Workshop, Santa Barbara, U.S.A., pages 339–341.
Margulies, E., (1997). Understanding JAVA Telephony. Flatiron Publishing, Inc., New York.
Microsoft, 1998. Microsoft Speech API 4.0.
Motorola, (1997). Scalable Language API. Version 0.82, May 13th.
Muthusamy, Y., Agarwal, R., Gong, Y., and Viswanathan, V., (1999). Speech-Enabled Information Retrieval in the Automobile Environment. ICASSP, pages 2259–2262.
Padmanabhan, M., Bahl, L.R., Nahamoo, D., and de Souza, P., (1997). Decision-Tree Quantization of the Feature Space of a Speech Recognizer. EUROSPEECH, pages 147–150.
Paul, D.B., (1999). An Investigation of Gaussian Shortlists. ASRU Workshop, Keystone, Colorado, U.S.A.
Pouteau, X., Krahmer, E., and Landsbergen, J., (1997). Robust Spoken Dialogue Management for Driver Information Systems. EUROSPEECH, pages 2207–2210.
Pouteau, X. and Arévalo, L., (1998). Robust Spoken Dialogue Systems for Consumer Products: A Concrete Application. ICSLP, pages 1231–1234.
Ramalingam, C.S., Gong, Y., Netsch, L.P., Anderson, W.W., Godfrey, J.J., Kao, Y-H., (1999). Speaker-Dependent Name Dialing in a Car Environment with Out-of-Vocabulary Rejection. ICASSP, pages 165–168.
Simonin, J., Delphin-Poulat, L., and Damnati, G., (1998). Gaussian Density Tree Structure in a Multi-Gaussian HMM-Bused Speech Recognition System. ICSLP, pages 2939–2942.
Sun Microsystems, (1998). Javaâ„¢ Speech API Programmer Guide. Version 1.0, October 26
Tan, T.T., Gu, Y., and Thomas T., (1999). Word Confusability Measures for Vocabulary Selection in Speech Recognition. ASRU Workshop, Keystone, Colorado, U.S.A.
van den Heuvel, H., Bonafonte, A., Boudy, J., Dufour, S., Lockwood, P., Moreno, A., Richard, G., (1999). SpeechDat-Car: Towards a Collection of Speech Databases for Automotive Environments. COST 249 and IEEE Workshop on Robust Methods for Speech Recognition in Adverse Conditions, Tampere, Finland, pages 135–138.
VoiceXML Forum, (1999). Voice Extensible Markup Language, Version 0.9. http://www.voicexml.org.
Watanabe, T., Shinoda, K., Takagi, K., and Iso, K.I., (1995). High Speed Speech Recognition Using Tree-Structured Probability Density Function. ICASSP, pages 556–559.
Westphal, M. and Waibel, A., (1999). Towards Spontaneous Speech Recognition for On-Board Car Navigation and Inforination Systems. EUROSPEECH, pages 1955–1958.
Rights and permissions
Copyright information
© 2002 Kluwer Academic Publishers
About this chapter
Cite this chapter
(2002). From Cost Sensitive Embedded Applications to PC-based Systems. In: Robust Speech Recognition in Embedded Systems and PC Applications. The International Series in Engineering and Computer Science, vol 563. Springer, Boston, MA. https://doi.org/10.1007/0-306-47027-6_4
Download citation
DOI: https://doi.org/10.1007/0-306-47027-6_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-7873-0
Online ISBN: 978-0-306-47027-1
eBook Packages: Springer Book Archive