Skip to main content

Intelligent Multi-modal Interfaces for Mobile Applications in Hostile Environment(IM-HOST)

  • Chapter
Human Machine Interaction

Abstract

Multi-modal interfaces for mobile applications include tiny screens, keyboards, touch screens, ear phones, microphones and software components for voice-based man-machine interaction. The software enabling voice recognition, as well as the microphone, are of primary importance in a noisy environment. Current performances of voice applications are reasonably good in quiet environment. However, the surrounding noise in many practical situations largely deteriorates the quality of the speech signal. As a consequence, the recognition rate decreases significantly. Noise management is a major focus in developing voice-enabled technologies. This project addresses the problem of voice recognition with the goal of reaching a high success rate (ideally above 99%) in an outdoor environment that is noisy and hostile: the user stands on an open deck of a motor-boat and use his/her voice to command applications running on a laptop by using a wireless microphone. In addition to the problem of noise, there are other constraints strongly limiting the hardware options. Furthermore, the user must also perform several tasks simultaneously. The success of the solution must rely on the efficiency and effectiveness of the voice recognition algorithm and the choice of the microphone. In addition, the training of the recognizer should be kept to a minimum and the recognition time should not last longer than 3 seconds. For these two reasons, only a limited set of voice commands have been tested.

A first demonstrator based on digit keyword spotting trained over phone speech showed poor performances in very noisy conditions. A second demonstrator combining neural network and template matching techniques lead to nearly acceptable results when the user recorded the keywords. Since the recognition rate was approximated around 90%, no additional field test was undertaken. This R&D project shows that state-of-the-art research on voice recognition needs further investigations in order to recognize spoken keywords in noisy environments. In addition to on-going improvements, unconventional research approaches that are worth testing include, deriving adapted keywords to specialized algorithms and having the user learn these keyword.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Vergyri, D., et al.: The SRI/OGI 2006 Spoken Term Detection System. In: Proc. of Interspeech (2007)

    Google Scholar 

  2. Miller, D., et al.: Rapid and Accurate Spoken Term Detection. In: Proc. of NIST Spoken Term Detection Workshop (STD 2006) (December 2006)

    Google Scholar 

  3. Szoke, I., et al.: Combination of Word and Phoneme Approach for Spoken Term Detection. In: 4th Joint Workshop on Machine Learning and Multimodal Interaction (2007)

    Google Scholar 

  4. James, D., Young, S.: A Fast Lattice-Based Approach to Vocabulary Independent Wordspotting. In: Proc. of IEEE Conf. Acoust. Speech. Signal Process. (ICASSP) (1994)

    Google Scholar 

  5. Szoke, I., et al.: Comparison of Keyword Spotting Approaches for Informal Continuous Speech. In: Proc. of Interspeech (2005)

    Google Scholar 

  6. Hermansky, H., Fousek, P., Lehtonen, M.: The Role of Speech in Multimodal Human-Computer Interaction (Towards Reliable Rejection of Non-Keyword Input). In: Matoušek, V., Mautner, P., Pavelka, T. (eds.) TSD 2005. LNCS, vol. 3658, pp. 2–8. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Wachter, M.D., Demuynck, K., Compernolle, D.V., Wambacq, P.: Data Driven Example Based Continuous Speech Recognition. In: Proceedings of Eurospeech, pp. 1133–1136 (2003)

    Google Scholar 

  8. Aradilla, G., Vepa, J., Bourlard, H.: Improving Speech Recognition Using a Data-Driven Approach. In: Proceedings of Interspeech, pp. 3333–3336 (2005)

    Google Scholar 

  9. Axelrod, S., Maison, B.: Combination of Hidden Markov Models with Dynamic Time Warping for Speech Recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. I, pp. 173–176 (2004)

    Google Scholar 

  10. Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: On Using MLP features in LVCSR. In: Proceedings of International Conference on Spoken Language Processing (ICSLP) (2004)

    Google Scholar 

  11. Hermansky, H., Ellis, D., Sharma, S.: Tandem Connectionist Feature Extraction for Conventional HMM Systems. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2000)

    Google Scholar 

  12. Aradilla, G., Vepa, J., Bourlard, H.: Using Posterior-Based Features in Template Matching for Speech Recognition. In: Proceedings of International Conference on Spoken Language Processing (ICSLP) (2006)

    Google Scholar 

  13. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    MATH  Google Scholar 

  14. Aradilla, G., Vepa, J., Bourlard, H.: Using Pitch as Prior Knowledge in Template-Based Speech Recognition. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2006)

    Google Scholar 

  15. Niyogi, P., Sondhi, M.M.: Detecting Stop Consonants in Continuous Speech. The Journal of the Acoustic Society of America 111(2), 1063–1076 (2002)

    Article  Google Scholar 

  16. Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Audio, Speech and Signal Processing 28, 357–366 (1980)

    Article  Google Scholar 

  17. Hermansky, H.: Perceptual Linear Predictive (PLP) Analysis of Speech. The Journal of the Acoustic Society of America 87 (1990)

    Google Scholar 

  18. Wachter, M.D., Demuynck, K., Wambacq, P., Compernolle, D.V.: A Locally Weighted Distance Measure For Example Based Speech Recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 181–184 (2004)

    Google Scholar 

  19. Matton, M., Wachter, M.D., Compernolle, D.V., Cools, R.: A Discriminative Locally Weighted Distance Measure for Speaker Independent Template Based Speech Recognition. In: Proceedings of International Conference on Spoken Language Processing (ICSLP) (2004)

    Google Scholar 

  20. Cover, T.M., Thomas, J.A.: Information Theory. John Wiley, Chichester (1991)

    MATH  Google Scholar 

  21. Bhattacharyya, A.: On a Measure of Divergence between Two Statistical Populations Defined by their probability distributions. Bull. Calcutta Math. Soc. 35, 99–109 (1943)

    MathSciNet  MATH  Google Scholar 

  22. Fukunaga, K.: Introduction to Statistical Pattern Recognition. Morgan Kaufmann, Academic Press (1990)

    Google Scholar 

  23. Mak, B., Barnard, E.: Phone Clustering Using the Bhattacharyya Distance. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 2005–2008 (1996)

    Google Scholar 

  24. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley Interscience, Hoboken (2001)

    MATH  Google Scholar 

  25. Hermansky, H., Fousek, P.: Multi-Resolution RASTA Filtering for TANDEM-based ASR. In: Proceedings of Interspeech (2005)

    Google Scholar 

  26. Bourlard, H., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach, vol. 247. Kluwer Academic Publishers, Boston (1993)

    Google Scholar 

  27. Dupont, S., Bourlard, H., Deroo, O., Fontaine, V., Boite, J.M.: Hybrid HMM/ANN Systems for Training Independent Tasks: Experiments on Phonebook and Related Improvements. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1997)

    Google Scholar 

  28. Bradley, S., et al.: The mechanisms creating wind noise in microphones. University of Salford, Nokia Mobile Phones

    Google Scholar 

  29. Rabiner, L.: Techniques for Speech and Natural Language Recognition. Rutgers, The State University of New Jersey (2002)

    Google Scholar 

  30. Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T.S.: Highlights extraction from sports video based on an audio-visual marker detection framework (2005)

    Google Scholar 

  31. Cole, R.A., Noel, M., Lander, T., Durham, T.: New Telephone Speech Corpora at CSLU. In: Proceedings of Eurospeech (1995)

    Google Scholar 

  32. Rey, P.-H.: Opportunities in Sport for Voice-Enabled Technologies. Master of Advanced Studies in Sport Administration and Technology thesis, AISTS (2006)

    Google Scholar 

  33. Stricker, C., Rey, P.-H.: How can voice-enabled technologies help athletes and coaches to become more efficient? In: 3rd Asia-Pacific Congress on Sports Technology, Singapore (2007)

    Google Scholar 

  34. Shneiderman, B.: The Limits of Speech Recognition. Communications of the ACM 43(9) (September 2000)

    Google Scholar 

  35. Grosso, M.A.: The long-Term Adoption of Speech Recognition in Medical Applications. George Washington University School of Medicine (2003)

    Google Scholar 

  36. Strayer, D.L., Johnson, W.A.: Driven to distraction: dual-task studies of simulated driving and conversing on a cellular phone. Psychol. Sci. 12, 462–466 (2001)

    Article  Google Scholar 

  37. Wagen, J.-F., Imhalsy, M.: Conception de produits et de services basés sur la Reconnaissance Vocale: exemples d’une collaboration IDIAP/HES-SO. TIC day, Martigny, May 24 (2007), http://home.hefr.ch/wagen/Imhost_Humavox_TicDay_Final.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Stricker, C. et al. (2009). Intelligent Multi-modal Interfaces for Mobile Applications in Hostile Environment(IM-HOST). In: Lalanne, D., Kohlas, J. (eds) Human Machine Interaction. Lecture Notes in Computer Science, vol 5440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00437-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00437-7_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00436-0

  • Online ISBN: 978-3-642-00437-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics