Skip to main content

The Role and Importance of Speech Standards

  • Chapter
  • First Online:
Multimodal Interaction with W3C Standards

Abstract

Within only a few years the landscape of speech and DTMF applications changed from being based on proprietary languages to being completely based on speech standards. In that, a role of primary importance was played by W3C Voice Browser Working Group (VBWG). This chapter describes this change, the implications, and highlights the standards created by the W3C VBWG, as well as the benefits that these standards can induce in many other application fields, including multi-modal interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    All W3C Recommendations include a reference to an Implementation Report document to assess the implementability of the proposed standard. For instance, the VoiceXML 2.0 Implementation Report [44] was very important in showing how to implement a procedure to automate most of the tests.

  2. 2.

    The companies which submitted an Implementation Report [44] for VoiceXML 2.0 were nine: Comverse, Genesys, Loquendo, Motorola, PublicVoiceXML Consortium, Tellme Networks, Vocalocity, VoiceGenie Technologies, and Voxpilot.

  3. 3.

    The XML Schema of VoiceXML 2.0 includes the references to: SRGS 1.0 and SSML 1.0 XML Schemas, see Appendix O of VoiceXML 2.0 specification [7].

References

  1. W3C (1998). Voice Browsers, W3C Workshop, Cambridge, MA. https://www.w3.org/Voice/1998/Workshop/. Accessed 1 Mar 2016.

  2. W3C (2016). Voice Browser Working Group. https://www.w3.org/Voice/. Accessed 1 Mar 2016.

  3. W3C (2016). Multimodal Interaction Working Group. https://www.w3.org/2002/mmi/. Accessed 1 Mar 2016.

  4. VoiceXML Forum (2016). http://www.voicexml.org/. Accessed 1 Mar 2016.

  5. VoiceXML Forum (2000). Voice eXtensible Markup Language (VoiceXML) version 1.0. https://www.w3.org/TR/voicexml/. Accessed 1 Mar 2016.

  6. VoiceXML Forum (2016). e-zine. http://www.voicexml.org/voicexml-review-archive/. Accessed 15 Mar 2016.

  7. McGlashan, S., Burnett, D. C., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., et al. (2004). Voice Extensible Markup Language (VoiceXML) version 2.0, W3C Recommendation. https://www.w3.org/TR/voicexml20/. Accessed 1 Mar 2016.

  8. Hunt, A., & McGlashan, S. (2004). Speech Recognition Grammar Specification Version 1.0, W3C Recommendation. https://www.w3.org/TR/speech-grammar/. Accessed 1 Mar 2016.

  9. Burnett, D. C., Walker, M. R., & Hunt, A. (2004). Speech Synthesis Markup Language (SSML) Version 1.0, W3C Recommendation. https://www.w3.org/TR/speech-synthesis/. Accessed 1 Mar 2016.

  10. Oshry, M., Auburn, R. J., Baggia, P., Bodell, M., Burke, D., Burnett, D. C., et al. (2007). Voice Extensible Markup Language (VoiceXML) 2.1, W3C Recommendation. https://www.w3.org/TR/voicexml21/. Accessed 1 Mar 2016.

  11. van Tichelen, L., & Burke, D. (2007). Semantic Interpretation for Speech Recognition (SISR) Version 1.0, W3C Recommendation. https://www.w3.org/TR/semantic-interpretation/. Accessed 1 Mar 2016.

  12. Burnett, D. C., & Shuang, Z. W. (2010). Speech Synthesis Markup Language (SSML) Version 1.1, W3C Recommendation. https://www.w3.org/TR/speech-synthesis11/. Accessed 1 Mar 2016.

  13. Baggia, P. (2008). Pronunciation Lexicon Specification (PLS) Version 1.0, W3C Recommendation. https://www.w3.org/TR/pronunciation-lexicon/. Accessed 1 Mar 2016.

  14. Auburn, R. J. (2011). Voice Browser Call Control: CCXML Version 1.0, W3C Recommendation. https://www.w3.org/TR/ccxml/. Accessed 1 Mar 2016.

  15. Larson, J. A. (2007). W3C speech interface language: VoiceXML. IEEE Signal Processing Magazine, 4(3), 126–130.

    Article  Google Scholar 

  16. Jokinen, K., & McTear, M. (2009). Spoken dialogue systems. Princeton, NJ: Morgan & Claypool.

    Google Scholar 

  17. McGlashan, S., Burnett, D. C., Akolkar, R., Auburn, R. J., Baggia, P., Barnett, J., et al. (2010). Voice Extensible Markup Language (VoiceXML) Version 3.0, W3C Working Draft. https://www.w3.org/TR/voicexml30/. Accessed 1 Mar 2016.

  18. Barnett, J., Akolkar, R., Auburn, R. J., Bodell, M., Carter, J., McGlashan, S., et al. (2015). State Chart XML (SCXML): State Machine Notation for Control Abstraction, W3C Recommendation. https://www.w3.org/TR/scxml/. Accessed 1 Mar 2016.

  19. Harel, D. (1987). StateCharts: A visual formalism for complex systems. Journal Science of Computer Programming, 8(3), 231–274.

    Article  MathSciNet  MATH  Google Scholar 

  20. Brown, M. K., Kellner, A., & Raggett, D. (2001). Stochastic Language Models (N-Gram) Specification, W3C Working Draft. https://www.w3.org/TR/ngram-spec/. Accessed 1 Mar 2016.

  21. Burnett, D. C. (2015). ALL: Thoughts and thanks as the VBWG comes to a close. W3C Mailing List Archive. https://lists.w3.org/Archives/Public/www-voice/2015JulSep/0029.html. Accessed 1 Mar 2016.

  22. VoiceXML Forum (2016). VoiceXML Platform Certification Program. http://www.voicexml.org/certification-programs/voicexml-platform-certification-program/. Accessed 1 Mar 2016.

  23. ECMA (2001). ECMAScript 3rd Edition Compact Profile. http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/Ecma-327.pdf. Accessed 1 Mar 2016.

  24. The Internet Engineering Task Force (IETF) (2016). https://www.ietf.org/. Accessed 1 Mar 2016.

  25. Burnett, D., & Shanmugham, S. (2012). Media Resource Control Protocol Version 2 (MRCPv2), RFC 6787—Internet Standard. http://www.rfc-base.org/txt/rfc-6787.txt. Accessed 1 Mar 2016.

  26. Burke, D. (2007). Speech processing for ip networks: Media resource control protocol (MRCP). New York, NY: Wiley.

    Google Scholar 

  27. Johnston, M., Baggia, P., Burnett, D. C., Carter, J., Dahl, D. A., McCobb, G., et al. (2009). EMMA: Extensible MultiModal Annotation markup language, W3C Recommendation. https://www.w3.org/TR/emma/. Accessed 1 Mar 2016.

  28. Axelsson, J., Cross, C., Lie, H. W., McCobb, G., Raman, T. V., Wilson, L. (2001). XHTML+Voice Profile 1.0, W3C Note. https://www.w3.org/TR/xhtml+voice/. Accessed 1 Mar 2016.

  29. Microsoft Corporation, Speech Application Language Tags (SALT) (2003). Technical article. https://msdn.microsoft.com/en-us/library/ms994629.aspx. Accessed 1 Mar 2013.

  30. Shires, G., & Wennborg, H. (2012). Web Speech API Specification, W3C Community Group Final Report. https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html. Accessed 1 Mar 2016.

  31. Barnett, J., Bodell, M., Dahl, D., Kliche, I., Larson, J., Porter, B., et al. (2012). Multimodal Architecture and Interfaces, W3C Recommendation. https://www.w3.org/TR/mmi-arch/. Accessed 15 Mar 2016.

  32. Johnston, M., Dahl, D., Denney, T., & Kharidi, N. (2015). EMMA: Extensible MultiModal Annotation markup language Version 2.0, W3C Working Draft. https://www.w3.org/TR/emma20/. Accessed 15 Mar 2016.

  33. Kistner, G., & Neurenberger, C. (2004). Developing user interfaces using SCXML statecharts. In Proceedings of the 1st EICS Workshop on Engineering Interactive Computer Systems with SCXML, pp. 5–11. http://tuprints.ulb.tu-darmstadt.de/4053/.

  34. Almeida, N., Silva, S., & Teixeira, A. (2004). Multimodal multi-device application supported by an SCXML state chart machine. In Proceedings of the 1st EICS Workshop on Engineering Interactive Computer Systems with SCXML. pp. 12–17. http://tuprints.ulb.tu-darmstadt.de/4053/.

  35. Schnelle-Walka, D., Radomski, S., Lager, T., Barnett, J., Dahl, D., Mühlhäuser, M. (Eds.) (2014). Proceedings of the 1st EICS Workshop on Engineering Interactive Computer Systems with SCXML. Darmstadt: TU Darmstadt.

    Google Scholar 

  36. Burkhardt, F., Schröder, M., Baggia, P., Pelachaud, C., Peter, C., & Zovato, E. (2014). Emotion Markup Language (EmotionML) 1.0, W3C Recommendation. https://www.w3.org/TR/emotionml/. Accessed 15 Mar 2016.

  37. Schnelle-Walka, D., Radeck-Arneth, S., & Striebinger, J. (2015). Multimodal dialog management in a smart home context with SCXML. In Proceedings 2nd Workshop on Engineering Interactive Systems with SCXML, Duisburg, DE.

    Google Scholar 

  38. López, G., Peláez, V., González, R., & Lobato, V. (2011). Voice control in smart homes using distant microphones: A VoiceXML-based approach, in ambient intelligence. Lecture Notes in Computer Science (Vol. 7040) (pp. 172–181). Berlin/Heidelberg: Springer.

    Google Scholar 

  39. Teixeira, A., Almeida, N., Pereira, C., & Oliveira, M. (2013). W3C MMI architecture as a basis for enhanced interaction for ambient assisted living. New York, NY: W3C Workshop on Rich Multimodal Application Development.

    Google Scholar 

  40. Sigüenza, A., Blanco, J. L., Bernat, J., & Hernández, L. A. (2010). Using SCXML for semantic sensor networks. In Proceedings of the 3rd International Workshop on Semantic Sensor Networks (SSN10). Workshop at the 9th International Semantic Web Conference (ISWC2010) - ISWC 2010 Workshops Volume V, Shanghai, China, pp. 33–48. http://ceur-ws.org/Vol-668/.

  41. Radomski, S., & Schnelle-Walka, D. (2012). VoiceXML for pervasive environments. International Journal of Mobile Human Computer Interaction, 4(2), 18–36.

    Article  Google Scholar 

  42. Schnelle-Walka, D., Radomski, S., & Mühlhãuser, M. (2015). Modern standards for VoiceXML in pervasive multimodal applications. In J. Lumsden (Ed.), Emerging perspectives on the design, use, and evaluation of mobile and handheld devices. IGI Global: http://www.igi-global.com/book/emerging-perspectives-design-use-evaluation/125520

  43. Bühler, D., & Hamerich, S. W. (2005). Towards VoiceXML compilation for portable embedded applications in ubiquitous environments. In Proceedings of Interspeech 2005, Lisbon, PT, pp. 3397–3400. http://www.isca-speech.org/archive/interspeech_2005/i05_3397.html; http://www.isca-speech.org/archive/interspeech_2005/index.html.

  44. Oshry, M., Adeeb, R., Baggia, P., Blackman, A., Bodell, M., Burke, D., et al. (2004). VoiceXML 2.0 Implementation Report. https://www.w3.org/Voice/2004/vxml-ir/. Accessed 1 Mar 2016.

  45. Shanmugham, S., Monaco, P., & Eberman, B. (2006). A Media Resource Control Protocol (MRCP), RFC 4463—Informational. https://tools.ietf.org/html/rfc4463. Accessed 1 Mar 2016.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paolo Baggia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Baggia, P., Burnett, D.C., Marchand, R., Matula, V. (2017). The Role and Importance of Speech Standards. In: Dahl, D. (eds) Multimodal Interaction with W3C Standards. Springer, Cham. https://doi.org/10.1007/978-3-319-42816-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42816-1_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42814-7

  • Online ISBN: 978-3-319-42816-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics