Optimisation of Artificial Neural Network Topology Applied in the Prosody Control in Text-to-Speech Synthesis

Šebesta, Václav; Tučková, Jana

doi:10.1007/3-540-44411-4_31

Václav Šebesta⁷ &
Jana Tučková⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1963))

Included in the following conference series:

International Conference on Current Trends in Theory and Practice of Computer Science

414 Accesses

Abstract

Multilayer artificial neural networks (ANN) are often used for the solution of classification problems or for the time series forecasting. An appropriate number of learning and testing patterns must be available for the ANN training. Each training pattern is composed of n input parameters and m output parameters. Number m is usually given by the problem formulation, but the number n may be often selected from a greater set of input parameters. An optimal selection of input parameters is a very important task especially in a situation when the number of usable input parameters is great and the analytical relations between the input and output parameters are not known. The number of neurons in all ANN layers must be generally kept as small as possible because of the optimal generalisation ability.

In this paper we present a possible way for the selection of significant input parameters (the so called “markers”), which are the most important ones from the point of view of influence on the output parameters. These parameters are later used for the training of ANN. A statistical approach is usually used for this reason [5]. After some experience in the ANN application we recognised that the approach based on mathematical logic, i. e. the GUHA method (General Unary Hypotheses Automaton) is also suitable for the determination of markers.

Besides the minimisation of the number of elements in the input layer of ANN, also the number of neurons in hidden layers must be optimised. For this reason standard methods of pruning can be used, described e. g. in [1]. We have used this method in the following applications: - Optimisation of the intervals between the major overhaul of plane engines by the analysis of tribodiagnostic data. Only selected types of chemical pollution in oil can be taken into account. - Prediction of bleeding of patients with chronic lymphoblastic leukemia. Only a part of parameters about the patient is important from this point of view (see [2]). - Optimisation of the quality and reliability prediction of artificial resin production in chemical factory. Only a part of the production parameters (times of production phases, temperatures, percentage of components etc.) have straight influence on the product. - Optimisation of the prosody control in the text-to-speech synthesis. This application is described in the paper.

Supported by GA AS CR, grant No. A2030801

Supported by GA CR, grant No. 102/96/K087 and COST 258

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

V. Šebesta. Pruning of neural networks by statistical optimization. In Proc. 6th Schoolon Neural Networks, pages 209–214. Microcomputer, 1994. 420, 426
Google Scholar
V. Šebesta and L. Straka. Determination of markers by GUHA method for neural network training. Neural Network World, 8(3): 255–268, 1998. 421
Google Scholar
V. Šebesta and J. Tučková. Selection of important input parameters for a text-to-speech synthesis by neural networks. In Proc. International Joint Conference on Neural Networks IJCNN’ 99. Washington, DC, USA, 1999. 422
Google Scholar
P. Hájek, A. Sochorová, and J. Zvárová. GUHA for personal computers. Computational Statistics and Data Analysis, 19: 149–153, 1995. 424, 425
Article MATH Google Scholar
A. K. Jain, R. P. W. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Trans. on PAMI, 22(1):4–37, 2000. 420
Google Scholar
P. Hájek, et. al. GUHA method — objectives and tools. Proc. IXth SOFSEM. VUT UJEP, Brno, 1982. (in Czech). 424, 425
Google Scholar
M. P. Reidi. Controlling Segmental Duration in Speech Synthesis System. PhD thesis, ETHZurich, Switzerland.
Google Scholar
T. J. Sejnowski and Ch. R. Rosenberg. NETtalk: a parallel network that learns to read aloud. Technical Report JHU/EECS-86/01, John Hopkins University.
Google Scholar
J. Terken. Variation of accent prominence within the phrase: Models and spontaneous speech data. Computing Prosody, pages 95–116, 1997.
Google Scholar
Ch. Traber. SVOX:The implementation of the Text-to-Speech System for German. PhD thesis, ETH Zurich, Switzerland, 1995.
Google Scholar
J. Tučková and P. Horák. Fundamental frequency control in czech text-to-speech synthesis. In Proc. Third Workshop on ECMS’ 97. Toulouse, France, 1997.
Google Scholar
J. Tučková and R. Vích. Fundamental frequency modelling by neural nets in the czech text-to-speech synthesis. In Proc. IASTED Int. Conference Signal and ImageProcessing SIP’ 97, pages 85–87. New Orleans, USA, 1997.
Google Scholar
R. Vích. Pitch synchronous linear predictive czech and slovak text-to-speech synthesis. In Proc. 15th Int. Congress on Acoustics. Trondheim, Norway, 1995. 421
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Academy of Sciences of the Czech Republic, and Faculty of Transportation, Czech Technical University, Czech
Václav Šebesta
Institute of Radioengineering and Electronics, Academy of Sciences of the Czech Republic, Faculty of Electrical Engineering, Czech Technical University, Czech
Jana Tučková

Authors

Václav Šebesta
View author publications
You can also search for this author in PubMed Google Scholar
Jana Tučková
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Cybernetics, Czech Technical University, Karlovo nám. 13, 121 35, Prague, Czech Republic
Václav Hlaváč
Information Technology Department, CLRC RAL, Chilton, Didcot, Oxfordshire, UK
Keith G. Jeffery
Insitute of Computer Science, Academy of Sciences of the Czech Republic, Pod vodárenskou věží 2, 182 07, Prague, Czech Republic
Jiří Wiedermann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Šebesta, V., Tučková, J. (2000). Optimisation of Artificial Neural Network Topology Applied in the Prosody Control in Text-to-Speech Synthesis. In: Hlaváč, V., Jeffery, K.G., Wiedermann, J. (eds) SOFSEM 2000: Theory and Practice of Informatics. SOFSEM 2000. Lecture Notes in Computer Science, vol 1963. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44411-4_31

Download citation

DOI: https://doi.org/10.1007/3-540-44411-4_31
Published: 22 January 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41348-6
Online ISBN: 978-3-540-44411-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics