Diction Based Prosody Modeling in Table-to-Speech Synthesis

Spiliotopoulos, Dimitris; Xydas, Gerasimos; Kouroupetroglou, Georgios

doi:10.1007/11551874_38

Dimitris Spiliotopoulos¹⁹,
Gerasimos Xydas¹⁹ &
Georgios Kouroupetroglou¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

700 Accesses
13 Citations

Abstract

Transferring a structure from the visual modality to the aural one presents a difficult challenge. In this work we are experimenting with prosody modeling for the synthesized speech representation of tabulated structures. This is achieved by analyzing naturally spoken descriptions of data tables and a following feedback by blind and sighted users. The derived prosodic phrase accent and pause break placement and values are examined in terms of successfully conveying semantically important visual information through prosody control in Table-to-Speech synthesis. Finally, the quality of the information provision of synthesized tables when utilizing the proposed prosody specification is studied against plain synthesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pontelli, E., Xiong, W., Gupta, G., Karshmer, A.: A Domain Specific Language Framework for Non-visual Browsing of Complex HTML Structures. In: Proc. ACM Conf. Assistive Technologies - ASSETS 2000, pp. 180–187 (2000)
Article Google Scholar
Ramel, J.-Y., Crucianou, M., Vincent, N., Faure, C.: Detection, Extraction and Representation of Tables. In: Proc. 7th Int. Conf. Document Analysis and Recognition - ICDAR 2003, pp. 374–378 (2003)
Article Google Scholar
Hurst, M., Douglas, S.: Layout & Language: Preliminary Experiments in Assigning Logical Structure to Table Cells. In: Proc. 4th Int. Conf. Document Analysis and Recognition - ICDAR 2003, pp. 1043–1047 (2003)
Google Scholar
Filepp, R., Challenger, J., Rosu, D.: Improving the Accessibility of Aurally Rendered HTML Tables. In: Proc. ACM Conf. on Assistive Technologies - ASSETS 2002, pp. 9–16 (2002)
Article Google Scholar
Lim, S., Ng, Y.: An Automated Approach for Retrieving Hierarchical Data from HTML Tables. In: Proc. 8th ACM Int. Conf. Information and Knowledge Management - CIKM 1999, pp. 466–474 (1999)
Article Google Scholar
Yesilada, Y., Stevens, R., Goble, C., Hussein, S.: Rendering Tables in Audio: The Interaction of Structure and Reading Styles. In: Proc. ACM Conf. Assistive Technologies - ASSETS 2004, pp. 16–23 (2004)
Google Scholar
Pontelli, E., Gillan, D., Xiong, W., Saad, E., Gupta, G., Karshmer, A.: Navigation of HTML Tables, Frames, and XML Fragments. In: Proc. ACM Conf. on Assistive Technologies - ASSETS 2002, pp. 25–32 (2002)
Article Google Scholar
Xydas, G., Argyropoulos, V., Karakosta, T., Kouroupetroglou, G.: An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents. In: Proc. Human-Computer Interaction - HCII (2005)
Google Scholar
Xydas, G., Spiliotopoulos, D., Kouroupetroglou, G.: Modeling Emphatic Events from Non- Speech Aware Documents in Speech Based User Interfaces. In: Proc. Human-Computer Interaction - HCII 2003, Theory and Practice, 2, pp. 806–810 (2003)
Google Scholar
Raman, T.: An Audio View of (LA)TEX Documents, TUGboat. In: Proc. 1992 Annual Meeting, vol. 13(3), pp. 372–379 (1992)
MathSciNet Google Scholar
Xydas, G., Kouroupetrolgou, G.: Text-to-Speech Scripting Interface for Appropriate Vocalisation of E-Texts. In: Proc. 7th European Conf. Speech Communication and Technology - EUROSPEECH 2001, pp. 2247–2250 (2001)
Google Scholar
Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G., Argyropoulos, V.: Experimentation on Spoken Format of Tables in Auditory User Interfaces. In: Universal Access in HCI, Proc. HCI International 2005: The 11th International Conference on Human-Computer Interaction (HCII-2005), Las Vegas, USA, 22-27 July, pp. 22–27 (2005) (to appear)
Google Scholar
Raggett, D., Le Hors, A., Jacobs, I.: Tables, HTML 4.01 Specification. W3C Recommendation (1999), http://www.w3.org/TR/REC-html40
Chisholm, W., Vanderheiden, G., Jacobs, I.: Web Content Accessibility Guidelines 1.0. W3C Recommendation, May 5 (1999), http://www.w3.org/TR/WAI-WEBCONTENT/
Penn, G., Hu, J., Luo, H., McDonald, R.: Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices. In: Proc. 6th Int. Conf. on Document Analysis and Recognition - ICDAR 2001, pp. 1074–1078 (2001)
Google Scholar
Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J.: ToBI: A Standard for Labeling English Prosody. In: Proc. Int. Conf. Spoken Language Processing - ICSLP 1992, vol. 2, pp. 867–870 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics and Telecommunications, University of Athens,
Dimitris Spiliotopoulos, Gerasimos Xydas & Georgios Kouroupetroglou

Authors

Dimitris Spiliotopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Gerasimos Xydas
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Kouroupetroglou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of West Bohemia in Pilsen, Univerzitni 8, 30614, Plzen, Czech Republic
Václav Matoušek , Pavel Mautner & Tomáš Pavelka , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G. (2005). Diction Based Prosody Modeling in Table-to-Speech Synthesis. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_38

Download citation

DOI: https://doi.org/10.1007/11551874_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28789-6
Online ISBN: 978-3-540-31817-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics