Design and Development of Speech Interaction: A Methodology

  • Nuno Almeida
  • Samuel Silva
  • António Teixeira
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8511)


Using speech in computer interaction is advantageous in many situation and more natural for the user. However, development of speech enabled applications presents, in general, a big challenge when designing the application, regarding the implementation of speech modalities and what the speech recognizer will understand.

In this paper we present the context of our work, describe the major challenges involved in using speech modalities, summarize our approach to speech interaction design and share experiences regarding our applications, their architecture and gathered insights.

In our approach we use a multimodal framework, responsible for the communication between modalities, and a generic speech modality allowing developers to quickly implement new speech enabled applications.

As part of our methodology, in order to inform development, we consider two different applications, one targeting smartphones and the other tablets or home computers. These adopt a multimodal architecture and provide different scenarios for testing the proposed speech modality.


Speech multimodal architecture decoupled modalities 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Baggia, P., Burnett, D.C., Carter, J., Dahl, D.A., McCobb, G., Raggett, D.: Emma: Extensible multimodal annotation markup language,
  5. 5.
    Barnett, J., Akolkar, R., Auburn, R., Bodell, M., Burnett, D.C., Carter, J., McGlashan, S., Lager, T., Helbing, M., Hosn, R., Raman, T., Reifenrath, K., Rosenthal, N., Roxendal, J.: State Chart XML (SCXML): State Machine Notation for Control Abstraction,
  6. 6.
    Bernsen: Towards a tool for predicting speech functionality. Speech 23, 181–210 (1997)Google Scholar
  7. 7.
    Bernsen, N., Dybkjaer, L.: Multimodal Usability (2009)Google Scholar
  8. 8.
    Bernsen, N.O.: Multimodal usability: More on modalities (December 2012),
  9. 9.
    Bernsen, N.O.: Multimodality in language and speech systems – from theory to design support tool. In: Granstrm, B., House, D., Karlsson, I. (eds.) Multimodality in Language and Speech Systems, Text, Speech and Language Technology, vol. 19, pp. 93–148. Springer, Netherlands (2002)CrossRefGoogle Scholar
  10. 10.
    Bodell, M., Dahl, D., Kliche, I., Larson, J., Porter, B.: Multimodal Architecture and Interfaces, W3C (2012),
  11. 11.
    Bui, T.H.: Multimodal dialogue management - state of the art. Technical Report TR-CTIT-06-01, Centre for Telematics and Information Technology University of Twente, Enschede (January 2006)Google Scholar
  12. 12.
    Dahl, D.A.: The W3C multimodal architecture and interfaces standard. Journal on Multimodal User Interfaces (April 2013),
  13. 13.
    Deketelaere, S., Cavalcante, R., RasaminJanahary, J.F.: Oasis speech-based interaction module. Tech. rep. (2009)Google Scholar
  14. 14.
    Hak, R., Dolezal, J., Zeman, T.: Manitou: A multimodal interaction platform. In: 2012 5th Joint IFIP Wireless and Mobile Networking Conference (WMNC), pp. 60–63 (September 2012)Google Scholar
  15. 15.
    Hale, K.S., Reeves, L., Stanney, K.M.: Design of systems for improved human interaction (2011)Google Scholar
  16. 16.
    Hoste, L., Dumas, B., Signer, B.: Mudra: A unified multimodal interaction framework. In: Proceedings of the 13th International Conference on Multimodal Interfaces, ICMI 2011, pp. 97–104. ACM, New York (2011)Google Scholar
  17. 17.
    Johnston, M., Fabbrizio, G.D., Urbanek, S.: mtalk - A multimodal browser for mobile services. In: INTERSPEECH, pp. 3261–3264. ISCA (2011)Google Scholar
  18. 18.
    Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-computer Relationship. MIT Press (2007)Google Scholar
  19. 19.
    Sarter, N.: Multimodal information presentation in support of human-automation communication and coordination, vol. 2, pp. 13–35. Emerald Group Publishing Limited (2002)Google Scholar
  20. 20.
    Sarter, N.B.: Multimodal information presentation: Design guidance and research challenges. International Journal of Industrial Ergonomics 36(5), 439–445 (2006)CrossRefGoogle Scholar
  21. 21.
    Teixeira, A., Braga, D., Coelho, L., Fonseca, J., Alvarelhão, J., Martins, I., Queirós, A., Rocha, N., Calado, A., Dias, M.: Speech as the basic interface for assistive technology. In: Proc. 2th International Conference on Software Development for Enhancing Accessibility and Fighting Info-Exclusion, DSAI (2009)Google Scholar
  22. 22.
    Teixeira, A., Hämäläinen, A., Avelar, J., Almeida, N., Németh, G., Fegyó, T., Zainkó, C., Csapó, T., Tóth, B., Oliveira, A., Dias, M.S.: Speech-centric multimodal interaction for easy-to-access online services – A personal life assistant for the elderly. In: Proc. DSAI 2013, Procedia Computer Science (November 2013)Google Scholar
  23. 23.
    Teixeira, A.J.S., Almeida, N., Pereira, C., Silva, M.O.: W3c mmi architecture as a basis for enhanced interaction for ambient assisted living. In: Get Smart: Smart Homes, Cars, Devices and the Web, W3C Workshop on Rich Multimodal Application Development. New York Metropolitan Area, US (July 2013)Google Scholar
  24. 24.
    Teixeira, A.J.S., Ferreira, F., Almeida, N., Rosa, A.F., Casimiro, J., Silva, S., Queirós, A., Oliveira, A.: Multimodality and adaptation for an enhanced mobile medication assistant for the elderly. In: Third Mobile Accessibility Workshop (MOBACC), CHI 2013 Extended Abstracts (April 2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Nuno Almeida
    • 1
    • 2
  • Samuel Silva
    • 1
  • António Teixeira
    • 1
    • 2
  1. 1.Institute of Electronics and Telematics EngineeringUniversity of AveiroPortugal
  2. 2.Dep. of Electronics, Telecommunications and Informatics EngineeringUniversity of AveiroPortugal

Personalised recommendations