The Role and Importance of Speech Standards

Baggia, Paolo; Burnett, Daniel C.; Marchand, Rob; Matula, Val

doi:10.1007/978-3-319-42816-1_2

Paolo Baggia²,
Daniel C. Burnett³,
Rob Marchand⁴ &
…
Val Matula⁵

720 Accesses
3 Citations

Abstract

Within only a few years the landscape of speech and DTMF applications changed from being based on proprietary languages to being completely based on speech standards. In that, a role of primary importance was played by W3C Voice Browser Working Group (VBWG). This chapter describes this change, the implications, and highlights the standards created by the W3C VBWG, as well as the benefits that these standards can induce in many other application fields, including multi-modal interfaces.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
All W3C Recommendations include a reference to an Implementation Report document to assess the implementability of the proposed standard. For instance, the VoiceXML 2.0 Implementation Report [44] was very important in showing how to implement a procedure to automate most of the tests.
2.
The companies which submitted an Implementation Report [44] for VoiceXML 2.0 were nine: Comverse, Genesys, Loquendo, Motorola, PublicVoiceXML Consortium, Tellme Networks, Vocalocity, VoiceGenie Technologies, and Voxpilot.
3.
The XML Schema of VoiceXML 2.0 includes the references to: SRGS 1.0 and SSML 1.0 XML Schemas, see Appendix O of VoiceXML 2.0 specification [7].

References

W3C (1998). Voice Browsers, W3C Workshop, Cambridge, MA. https://www.w3.org/Voice/1998/Workshop/. Accessed 1 Mar 2016.
W3C (2016). Voice Browser Working Group. https://www.w3.org/Voice/. Accessed 1 Mar 2016.
W3C (2016). Multimodal Interaction Working Group. https://www.w3.org/2002/mmi/. Accessed 1 Mar 2016.
VoiceXML Forum (2016). http://www.voicexml.org/. Accessed 1 Mar 2016.
VoiceXML Forum (2000). Voice eXtensible Markup Language (VoiceXML) version 1.0. https://www.w3.org/TR/voicexml/. Accessed 1 Mar 2016.
VoiceXML Forum (2016). e-zine. http://www.voicexml.org/voicexml-review-archive/. Accessed 15 Mar 2016.
McGlashan, S., Burnett, D. C., Carter, J., Danielsen, P., Ferrans, J., Hunt, A., et al. (2004). Voice Extensible Markup Language (VoiceXML) version 2.0, W3C Recommendation. https://www.w3.org/TR/voicexml20/. Accessed 1 Mar 2016.
Hunt, A., & McGlashan, S. (2004). Speech Recognition Grammar Specification Version 1.0, W3C Recommendation. https://www.w3.org/TR/speech-grammar/. Accessed 1 Mar 2016.
Burnett, D. C., Walker, M. R., & Hunt, A. (2004). Speech Synthesis Markup Language (SSML) Version 1.0, W3C Recommendation. https://www.w3.org/TR/speech-synthesis/. Accessed 1 Mar 2016.
Oshry, M., Auburn, R. J., Baggia, P., Bodell, M., Burke, D., Burnett, D. C., et al. (2007). Voice Extensible Markup Language (VoiceXML) 2.1, W3C Recommendation. https://www.w3.org/TR/voicexml21/. Accessed 1 Mar 2016.
van Tichelen, L., & Burke, D. (2007). Semantic Interpretation for Speech Recognition (SISR) Version 1.0, W3C Recommendation. https://www.w3.org/TR/semantic-interpretation/. Accessed 1 Mar 2016.
Burnett, D. C., & Shuang, Z. W. (2010). Speech Synthesis Markup Language (SSML) Version 1.1, W3C Recommendation. https://www.w3.org/TR/speech-synthesis11/. Accessed 1 Mar 2016.
Baggia, P. (2008). Pronunciation Lexicon Specification (PLS) Version 1.0, W3C Recommendation. https://www.w3.org/TR/pronunciation-lexicon/. Accessed 1 Mar 2016.
Auburn, R. J. (2011). Voice Browser Call Control: CCXML Version 1.0, W3C Recommendation. https://www.w3.org/TR/ccxml/. Accessed 1 Mar 2016.
Larson, J. A. (2007). W3C speech interface language: VoiceXML. IEEE Signal Processing Magazine, 4(3), 126–130.
Article Google Scholar
Jokinen, K., & McTear, M. (2009). Spoken dialogue systems. Princeton, NJ: Morgan & Claypool.
Google Scholar
McGlashan, S., Burnett, D. C., Akolkar, R., Auburn, R. J., Baggia, P., Barnett, J., et al. (2010). Voice Extensible Markup Language (VoiceXML) Version 3.0, W3C Working Draft. https://www.w3.org/TR/voicexml30/. Accessed 1 Mar 2016.
Barnett, J., Akolkar, R., Auburn, R. J., Bodell, M., Carter, J., McGlashan, S., et al. (2015). State Chart XML (SCXML): State Machine Notation for Control Abstraction, W3C Recommendation. https://www.w3.org/TR/scxml/. Accessed 1 Mar 2016.
Harel, D. (1987). StateCharts: A visual formalism for complex systems. Journal Science of Computer Programming, 8(3), 231–274.
Article MathSciNet MATH Google Scholar
Brown, M. K., Kellner, A., & Raggett, D. (2001). Stochastic Language Models (N-Gram) Specification, W3C Working Draft. https://www.w3.org/TR/ngram-spec/. Accessed 1 Mar 2016.
Burnett, D. C. (2015). ALL: Thoughts and thanks as the VBWG comes to a close. W3C Mailing List Archive. https://lists.w3.org/Archives/Public/www-voice/2015JulSep/0029.html. Accessed 1 Mar 2016.
VoiceXML Forum (2016). VoiceXML Platform Certification Program. http://www.voicexml.org/certification-programs/voicexml-platform-certification-program/. Accessed 1 Mar 2016.
ECMA (2001). ECMAScript 3rd Edition Compact Profile. http://www.ecma-international.org/publications/files/ECMA-ST-WITHDRAWN/Ecma-327.pdf. Accessed 1 Mar 2016.
The Internet Engineering Task Force (IETF) (2016). https://www.ietf.org/. Accessed 1 Mar 2016.
Burnett, D., & Shanmugham, S. (2012). Media Resource Control Protocol Version 2 (MRCPv2), RFC 6787—Internet Standard. http://www.rfc-base.org/txt/rfc-6787.txt. Accessed 1 Mar 2016.
Burke, D. (2007). Speech processing for ip networks: Media resource control protocol (MRCP). New York, NY: Wiley.
Google Scholar
Johnston, M., Baggia, P., Burnett, D. C., Carter, J., Dahl, D. A., McCobb, G., et al. (2009). EMMA: Extensible MultiModal Annotation markup language, W3C Recommendation. https://www.w3.org/TR/emma/. Accessed 1 Mar 2016.
Axelsson, J., Cross, C., Lie, H. W., McCobb, G., Raman, T. V., Wilson, L. (2001). XHTML+Voice Profile 1.0, W3C Note. https://www.w3.org/TR/xhtml+voice/. Accessed 1 Mar 2016.
Microsoft Corporation, Speech Application Language Tags (SALT) (2003). Technical article. https://msdn.microsoft.com/en-us/library/ms994629.aspx. Accessed 1 Mar 2013.
Shires, G., & Wennborg, H. (2012). Web Speech API Specification, W3C Community Group Final Report. https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html. Accessed 1 Mar 2016.
Barnett, J., Bodell, M., Dahl, D., Kliche, I., Larson, J., Porter, B., et al. (2012). Multimodal Architecture and Interfaces, W3C Recommendation. https://www.w3.org/TR/mmi-arch/. Accessed 15 Mar 2016.
Johnston, M., Dahl, D., Denney, T., & Kharidi, N. (2015). EMMA: Extensible MultiModal Annotation markup language Version 2.0, W3C Working Draft. https://www.w3.org/TR/emma20/. Accessed 15 Mar 2016.
Kistner, G., & Neurenberger, C. (2004). Developing user interfaces using SCXML statecharts. In Proceedings of the 1st EICS Workshop on Engineering Interactive Computer Systems with SCXML, pp. 5–11. http://tuprints.ulb.tu-darmstadt.de/4053/.
Almeida, N., Silva, S., & Teixeira, A. (2004). Multimodal multi-device application supported by an SCXML state chart machine. In Proceedings of the 1st EICS Workshop on Engineering Interactive Computer Systems with SCXML. pp. 12–17. http://tuprints.ulb.tu-darmstadt.de/4053/.
Schnelle-Walka, D., Radomski, S., Lager, T., Barnett, J., Dahl, D., Mühlhäuser, M. (Eds.) (2014). Proceedings of the 1st EICS Workshop on Engineering Interactive Computer Systems with SCXML. Darmstadt: TU Darmstadt.
Google Scholar
Burkhardt, F., Schröder, M., Baggia, P., Pelachaud, C., Peter, C., & Zovato, E. (2014). Emotion Markup Language (EmotionML) 1.0, W3C Recommendation. https://www.w3.org/TR/emotionml/. Accessed 15 Mar 2016.
Schnelle-Walka, D., Radeck-Arneth, S., & Striebinger, J. (2015). Multimodal dialog management in a smart home context with SCXML. In Proceedings 2nd Workshop on Engineering Interactive Systems with SCXML, Duisburg, DE.
Google Scholar
López, G., Peláez, V., González, R., & Lobato, V. (2011). Voice control in smart homes using distant microphones: A VoiceXML-based approach, in ambient intelligence. Lecture Notes in Computer Science (Vol. 7040) (pp. 172–181). Berlin/Heidelberg: Springer.
Google Scholar
Teixeira, A., Almeida, N., Pereira, C., & Oliveira, M. (2013). W3C MMI architecture as a basis for enhanced interaction for ambient assisted living. New York, NY: W3C Workshop on Rich Multimodal Application Development.
Google Scholar
Sigüenza, A., Blanco, J. L., Bernat, J., & Hernández, L. A. (2010). Using SCXML for semantic sensor networks. In Proceedings of the 3rd International Workshop on Semantic Sensor Networks (SSN10). Workshop at the 9th International Semantic Web Conference (ISWC2010) - ISWC 2010 Workshops Volume V, Shanghai, China, pp. 33–48. http://ceur-ws.org/Vol-668/.
Radomski, S., & Schnelle-Walka, D. (2012). VoiceXML for pervasive environments. International Journal of Mobile Human Computer Interaction, 4(2), 18–36.
Article Google Scholar
Schnelle-Walka, D., Radomski, S., & Mühlhãuser, M. (2015). Modern standards for VoiceXML in pervasive multimodal applications. In J. Lumsden (Ed.), Emerging perspectives on the design, use, and evaluation of mobile and handheld devices. IGI Global: http://www.igi-global.com/book/emerging-perspectives-design-use-evaluation/125520
Bühler, D., & Hamerich, S. W. (2005). Towards VoiceXML compilation for portable embedded applications in ubiquitous environments. In Proceedings of Interspeech 2005, Lisbon, PT, pp. 3397–3400. http://www.isca-speech.org/archive/interspeech_2005/i05_3397.html; http://www.isca-speech.org/archive/interspeech_2005/index.html.
Oshry, M., Adeeb, R., Baggia, P., Blackman, A., Bodell, M., Burke, D., et al. (2004). VoiceXML 2.0 Implementation Report. https://www.w3.org/Voice/2004/vxml-ir/. Accessed 1 Mar 2016.
Shanmugham, S., Monaco, P., & Eberman, B. (2006). A Media Resource Control Protocol (MRCP), RFC 4463—Informational. https://tools.ietf.org/html/rfc4463. Accessed 1 Mar 2016.

Download references

Author information

Authors and Affiliations

Department of Enterprise, Nuance Communications, Inc., Torino, Italy
Paolo Baggia
StandardsPlay, Lilburn, GA, USA
Daniel C. Burnett
Genesys, Markham, ON, Canada
Rob Marchand
Avaya Inc., Santa Clara, CA, USA
Val Matula

Authors

Paolo Baggia
View author publications
You can also search for this author in PubMed Google Scholar
Daniel C. Burnett
View author publications
You can also search for this author in PubMed Google Scholar
Rob Marchand
View author publications
You can also search for this author in PubMed Google Scholar
Val Matula
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paolo Baggia .

Editor information

Editors and Affiliations

Conversational Technologies, Plymouth Meeting, Pennsylvania, USA
Deborah A. Dahl

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baggia, P., Burnett, D.C., Marchand, R., Matula, V. (2017). The Role and Importance of Speech Standards. In: Dahl, D. (eds) Multimodal Interaction with W3C Standards. Springer, Cham. https://doi.org/10.1007/978-3-319-42816-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-42816-1_2
Published: 18 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42814-7
Online ISBN: 978-3-319-42816-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics