Augmented Auditory Representation of e-Texts for Text-to-Speech Systems

Xydas, Gerasimos; Kouroupetroglou, Georgios

doi:10.1007/3-540-44805-5_17

Gerasimos Xydas² &
Georgios Kouroupetroglou²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

409 Accesses
5 Citations
3 Altmetric

Abstract

Emerging electronic text formats include hierarchical structure and visualization related information that current Text-to-Speech (TtS) systems ignore. In this paper we present a novel approach for composing detailed auditory representation of e-texts using speech and audio. Furthermore, we provide a scripting language (CAD scripts) for defining specific customizations on the operation of a TtS. CAD scripts can be assigned as well to specific text meta-data to enable their discrete auditory representation. This approach can form a mean for a detailed exchange of functionality across different TtS implementations. Moreover, it can be hosted to current TtS systems with minor (or major) modifications. Finally, we briefly present the implementation of DEMOSTHeNES Composer for augmented auditory generation of meta-text using the above methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Voice eXtensible Markup Language (VoiceXML™) version 1.0, W3C Note 05 May 2000 (2000), http://www.w3.org/TR/voicexml/
Sproat, R., Taylor, P., Tanenblatt, M. and Isard, A.: A markup language for text-to-speech synthesis, In Proceedings of Eurospeech97, Rhodes, Greece (1997) 1747–1750
Google Scholar
Mitsopoulos, E.: A Principled Approach to the Design of Auditory Interaction in the Non-Visual User Interface, Submitted for the degree of Doctor of Philosophy, University of York, UK (2000)
Google Scholar
Hakulinen, J., Turunen, M. and Raiha, K.: The Use of Prosodic Features to Help Users Extract Information from Structured Elements in Spoken Dialogue Systems, In Proceedings of ESCA Tutorial and Research Workshop on Dialogue and Prosody, Eindhoven, The Netherlands, (1999) 65–70
Google Scholar
Shriver, S., Black, A. and Rosenfeld, R.: Audio Signals in Speech Interfaces, In Proceedings of International Conference on Spoken Language Processing (ICLSP-2000), Beijing, China (2000)
Google Scholar
Taylor, P., Black, A. and Caley, R.: The architecture of the Festival Speech Synthesis System, 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia (1998) 147–151
Google Scholar
Dutoit, T., Bagein, M., Malfrere, F., Pagel, V., Ruelle, A., Tounsi, N. and Wynsberghe, D.: EULER: an Open, Generic, Multi-lingual and Multi-Platform Text-To-Speech System, In Proceedings of LREC’00, Athens, Greece (2000) 563–566.
Google Scholar
Huckvale, M.: Presentation and Processing of Linguistic Structures for an All-Prosodic Systhesis System Using XML, In Proceedings of Eurospeech99, Budapest, Hungary (1999) 1847–1850
Google Scholar
Horlock, J.: How Information is Extracted at Edinburgh, TeSTIA-2000, 8th ELSNET Eupopean Summer School on Languge & Speech Communication, Chios, Greece (2000)
Google Scholar
Xydas, G. and Kouroupetroglou, G.: Text-to-Speech Scripting Interface for Appropriate Vocalisation of e-Texts, In Proceedings of Eurospeech2001, Aalborg, Denmark (2001)
Google Scholar
XSL Transformations (XSLT), Version 1.0, W3C Recommendation 16 November 1999, (1999) http://www.w3.org/TR/xslt
Xydas, G. and Kouroupetroglou, G.: DEMOSTHeNES Composer, Technical Report, University of Athens, Athens (2001)
Google Scholar
Dutoit, T., Pagel, V., Pierret, N., Bataille, F., Van Der Vreken, O.: The MBROLAProject: Towards a Set of High-Quality Speech Synthesizers Free of Use for Non-Commercial Purposes, In Proceedings of ICSLP’96, Philadelphia, vol. 3, (1996) 1393–1396
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics and Telecommunications, Division of Communication and Signal Processing, University of Athens, Panepistimiopolis, Ilisia, GR-15784, Athens, Greece
Gerasimos Xydas & Georgios Kouroupetroglou

Authors

Gerasimos Xydas
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Kouroupetroglou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, University of West Bohemia in Plzeň, Faculty of Applied Sciences, Univerzitní 22, 306-14, Plzeň, Czech Republic
Václav Matoušek , Pavel Mautner , Roman Mouček & Karel Taušer , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xydas, G., Kouroupetroglou, G. (2001). Augmented Auditory Representation of e-Texts for Text-to-Speech Systems. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_17

Download citation

DOI: https://doi.org/10.1007/3-540-44805-5_17
Published: 24 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics