Skip to main content

Diction Based Prosody Modeling in Table-to-Speech Synthesis

  • Conference paper
Text, Speech and Dialogue (TSD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3658))

Included in the following conference series:

Abstract

Transferring a structure from the visual modality to the aural one presents a difficult challenge. In this work we are experimenting with prosody modeling for the synthesized speech representation of tabulated structures. This is achieved by analyzing naturally spoken descriptions of data tables and a following feedback by blind and sighted users. The derived prosodic phrase accent and pause break placement and values are examined in terms of successfully conveying semantically important visual information through prosody control in Table-to-Speech synthesis. Finally, the quality of the information provision of synthesized tables when utilizing the proposed prosody specification is studied against plain synthesis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Pontelli, E., Xiong, W., Gupta, G., Karshmer, A.: A Domain Specific Language Framework for Non-visual Browsing of Complex HTML Structures. In: Proc. ACM Conf. Assistive Technologies - ASSETS 2000, pp. 180–187 (2000)

    Article  Google Scholar 

  2. Ramel, J.-Y., Crucianou, M., Vincent, N., Faure, C.: Detection, Extraction and Representation of Tables. In: Proc. 7th Int. Conf. Document Analysis and Recognition - ICDAR 2003, pp. 374–378 (2003)

    Article  Google Scholar 

  3. Hurst, M., Douglas, S.: Layout & Language: Preliminary Experiments in Assigning Logical Structure to Table Cells. In: Proc. 4th Int. Conf. Document Analysis and Recognition - ICDAR 2003, pp. 1043–1047 (2003)

    Google Scholar 

  4. Filepp, R., Challenger, J., Rosu, D.: Improving the Accessibility of Aurally Rendered HTML Tables. In: Proc. ACM Conf. on Assistive Technologies - ASSETS 2002, pp. 9–16 (2002)

    Article  Google Scholar 

  5. Lim, S., Ng, Y.: An Automated Approach for Retrieving Hierarchical Data from HTML Tables. In: Proc. 8th ACM Int. Conf. Information and Knowledge Management - CIKM 1999, pp. 466–474 (1999)

    Article  Google Scholar 

  6. Yesilada, Y., Stevens, R., Goble, C., Hussein, S.: Rendering Tables in Audio: The Interaction of Structure and Reading Styles. In: Proc. ACM Conf. Assistive Technologies - ASSETS 2004, pp. 16–23 (2004)

    Google Scholar 

  7. Pontelli, E., Gillan, D., Xiong, W., Saad, E., Gupta, G., Karshmer, A.: Navigation of HTML Tables, Frames, and XML Fragments. In: Proc. ACM Conf. on Assistive Technologies - ASSETS 2002, pp. 25–32 (2002)

    Article  Google Scholar 

  8. Xydas, G., Argyropoulos, V., Karakosta, T., Kouroupetroglou, G.: An Experimental Approach in Recognizing Synthesized Auditory Components in a Non-Visual Interaction with Documents. In: Proc. Human-Computer Interaction - HCII (2005)

    Google Scholar 

  9. Xydas, G., Spiliotopoulos, D., Kouroupetroglou, G.: Modeling Emphatic Events from Non- Speech Aware Documents in Speech Based User Interfaces. In: Proc. Human-Computer Interaction - HCII 2003, Theory and Practice, 2, pp. 806–810 (2003)

    Google Scholar 

  10. Raman, T.: An Audio View of (LA)TEX Documents, TUGboat. In: Proc. 1992 Annual Meeting, vol. 13(3), pp. 372–379 (1992)

    MathSciNet  Google Scholar 

  11. Xydas, G., Kouroupetrolgou, G.: Text-to-Speech Scripting Interface for Appropriate Vocalisation of E-Texts. In: Proc. 7th European Conf. Speech Communication and Technology - EUROSPEECH 2001, pp. 2247–2250 (2001)

    Google Scholar 

  12. Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G., Argyropoulos, V.: Experimentation on Spoken Format of Tables in Auditory User Interfaces. In: Universal Access in HCI, Proc. HCI International 2005: The 11th International Conference on Human-Computer Interaction (HCII-2005), Las Vegas, USA, 22-27 July, pp. 22–27 (2005) (to appear)

    Google Scholar 

  13. Raggett, D., Le Hors, A., Jacobs, I.: Tables, HTML 4.01 Specification. W3C Recommendation (1999), http://www.w3.org/TR/REC-html40

  14. Chisholm, W., Vanderheiden, G., Jacobs, I.: Web Content Accessibility Guidelines 1.0. W3C Recommendation, May 5 (1999), http://www.w3.org/TR/WAI-WEBCONTENT/

  15. Penn, G., Hu, J., Luo, H., McDonald, R.: Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices. In: Proc. 6th Int. Conf. on Document Analysis and Recognition - ICDAR 2001, pp. 1074–1078 (2001)

    Google Scholar 

  16. Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J.: ToBI: A Standard for Labeling English Prosody. In: Proc. Int. Conf. Spoken Language Processing - ICSLP 1992, vol. 2, pp. 867–870 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Spiliotopoulos, D., Xydas, G., Kouroupetroglou, G. (2005). Diction Based Prosody Modeling in Table-to-Speech Synthesis. In: Matoušek, V., Mautner, P., Pavelka, T. (eds) Text, Speech and Dialogue. TSD 2005. Lecture Notes in Computer Science(), vol 3658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551874_38

Download citation

  • DOI: https://doi.org/10.1007/11551874_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28789-6

  • Online ISBN: 978-3-540-31817-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics