Skip to main content

Augmented Auditory Representation of e-Texts for Text-to-Speech Systems

  • Conference paper
  • First Online:
Book cover Text, Speech and Dialogue (TSD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Included in the following conference series:

Abstract

Emerging electronic text formats include hierarchical structure and visualization related information that current Text-to-Speech (TtS) systems ignore. In this paper we present a novel approach for composing detailed auditory representation of e-texts using speech and audio. Furthermore, we provide a scripting language (CAD scripts) for defining specific customizations on the operation of a TtS. CAD scripts can be assigned as well to specific text meta-data to enable their discrete auditory representation. This approach can form a mean for a detailed exchange of functionality across different TtS implementations. Moreover, it can be hosted to current TtS systems with minor (or major) modifications. Finally, we briefly present the implementation of DEMOSTHeNES Composer for augmented auditory generation of meta-text using the above methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Voice eXtensible Markup Language (VoiceXML™) version 1.0, W3C Note 05 May 2000 (2000), http://www.w3.org/TR/voicexml/

  2. Sproat, R., Taylor, P., Tanenblatt, M. and Isard, A.: A markup language for text-to-speech synthesis, In Proceedings of Eurospeech97, Rhodes, Greece (1997) 1747–1750

    Google Scholar 

  3. Mitsopoulos, E.: A Principled Approach to the Design of Auditory Interaction in the Non-Visual User Interface, Submitted for the degree of Doctor of Philosophy, University of York, UK (2000)

    Google Scholar 

  4. Hakulinen, J., Turunen, M. and Raiha, K.: The Use of Prosodic Features to Help Users Extract Information from Structured Elements in Spoken Dialogue Systems, In Proceedings of ESCA Tutorial and Research Workshop on Dialogue and Prosody, Eindhoven, The Netherlands, (1999) 65–70

    Google Scholar 

  5. Shriver, S., Black, A. and Rosenfeld, R.: Audio Signals in Speech Interfaces, In Proceedings of International Conference on Spoken Language Processing (ICLSP-2000), Beijing, China (2000)

    Google Scholar 

  6. Taylor, P., Black, A. and Caley, R.: The architecture of the Festival Speech Synthesis System, 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia (1998) 147–151

    Google Scholar 

  7. Dutoit, T., Bagein, M., Malfrere, F., Pagel, V., Ruelle, A., Tounsi, N. and Wynsberghe, D.: EULER: an Open, Generic, Multi-lingual and Multi-Platform Text-To-Speech System, In Proceedings of LREC’00, Athens, Greece (2000) 563–566.

    Google Scholar 

  8. Huckvale, M.: Presentation and Processing of Linguistic Structures for an All-Prosodic Systhesis System Using XML, In Proceedings of Eurospeech99, Budapest, Hungary (1999) 1847–1850

    Google Scholar 

  9. Horlock, J.: How Information is Extracted at Edinburgh, TeSTIA-2000, 8th ELSNET Eupopean Summer School on Languge & Speech Communication, Chios, Greece (2000)

    Google Scholar 

  10. Xydas, G. and Kouroupetroglou, G.: Text-to-Speech Scripting Interface for Appropriate Vocalisation of e-Texts, In Proceedings of Eurospeech2001, Aalborg, Denmark (2001)

    Google Scholar 

  11. XSL Transformations (XSLT), Version 1.0, W3C Recommendation 16 November 1999, (1999) http://www.w3.org/TR/xslt

  12. Xydas, G. and Kouroupetroglou, G.: DEMOSTHeNES Composer, Technical Report, University of Athens, Athens (2001)

    Google Scholar 

  13. Dutoit, T., Pagel, V., Pierret, N., Bataille, F., Van Der Vreken, O.: The MBROLAProject: Towards a Set of High-Quality Speech Synthesizers Free of Use for Non-Commercial Purposes, In Proceedings of ICSLP’96, Philadelphia, vol. 3, (1996) 1393–1396

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xydas, G., Kouroupetroglou, G. (2001). Augmented Auditory Representation of e-Texts for Text-to-Speech Systems. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-44805-5_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42557-1

  • Online ISBN: 978-3-540-44805-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics