Very Low Bit Rate Speech Coding: Comparison of Data-Driven Units with Syllable Segments

Černocký, Jan; Kopeček, Ivan; Baudoin, Geneviève; Chollet, Gérard

doi:10.1007/3-540-48239-3_48

Jan Černocký³,
Ivan Kopeček⁴,
Geneviève Baudoin⁵ &
…
Gérard Chollet⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1692))

Included in the following conference series:

International Workshop on Text, Speech and Dialogue

478 Accesses
5 Citations

Abstract

Very low bit-rate (VLBR) coding of speech offers the opportunity to test methods of automatic generation of sub-word units. This paper describes two approaches to VLBR coding: the first based on ALISP (Automatic Language Independent Speech Processing) techniques, the second based on syllable segments. Experimental results are reported on a database of one Czech professional speaker. The obtained rates for unit encoding were approximately 135 bps for the former approach and 62 bps for the latter. The quality was evaluated by measuring the logarithmic spectral distortion (computed on LPC-spectra), and in informal listening tests. Possible mutual profits of each technique to the other are discussed.

The research has been partially supported by the Ministry of Education, Youth and Sports of the Czech Republic, project Nbs. VS97060, VS97028, and by the Grant Agency of the Czech Republic under the Grant 201/99/1248.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

F. Bimbot. Anévaluation of temporal decomposition, Technical report, Acoustic research department AT&T Bell Labs, 1990.
Google Scholar
S. Deligne. Modèles de séquences de longueurs variables: Application au traitement du langage écritét de la parole, PhD thesis, École nationale supérieure des télécommunications (ENST), Paris, 1996.
Google Scholar
G. Doddington. Syllable Based Speech Processing. Technical report, J. Hopkins University, 1997, WS97 Project Report, Research Notes No. 30.
Google Scholar
S. Greenberg. Speaking in Shorthand — A Syllable-Centric Perspective for Understanding Pronunciation Variation. In Proc. Workshop Modeling Pronunciation Variation for Automatic Speech Recognition, pages 47–56, 1998.
Google Scholar
L. Josifovski, D. Mihajlov, and D. Gorgevik. Speech Synthesizer Based on Time Domain Syllable Concatenation. In Proc. SPECOM’97, pages 165–170, Cluj-Napoca, 1997.
Google Scholar
I. Kopeček. Syllable Based Speech Synthesis. In Proc. 2nd International Workshop SPECOM’97, pages 161–165, Cluj-Napoca, 1997.
Google Scholar
I. Kopeček. Speech Synthesis Based on the Composed Syllable Segments. In Proc. of Workshop on Text Speech and Dialogue (TSD’98), pages 259–262, Brno, Czech Republic, September 1998.
Google Scholar
I. Kopeček. Syllable Segments in Czech. In Proc. XXVII. Mezhvuzovskoy naucznoy konferencii, pages 60–64, St. Petersburg, March 1998, Vypusk 10.
Google Scholar
I. Kopeček and K. Pala. Prosody modeling for sylable-based speech synthesis. In Proc. IASTED Conference on AI and Soft Computing, pages 134–137, 1998.
Google Scholar
J. Cernocký, G. Baudoin, D. Petrovska-Delacrétaz, J. Hennebert and G. Chollet. Automatically derived speech units: applications to very low rate coding and speaker verification. In Proc. of Workshop on Text Speech and Dialogue (TSD’98), pages 183–188, Brno, Czech Republic, September 1998.
Google Scholar
J. Cernocký, G. Baudoin and G. Chollet. Segmental vocoder — going beyond the phonetic approach. In Proc. IEEE ICASSP 98, pages 605–608, Seattle, WA, May 1998, http://www.fee.vutbr.cz/~{}cernocky/Icassp98.html.
J. Cernocký. Speech Processing Using Automatically Derived Segmental Units: Applications to Very Low Rate Coding and Speaker Verification. PhD thesis, Université Paris XI Orsay, 1998.
Google Scholar

Download references

Author information

Authors and Affiliations

Inst. of Radioelectronics, Brno Univ. of Technology, Brno
Jan Černocký
Faculty of Informatics, Masaryk University Brno, Brno
Ivan Kopeček
Dpt. Signal et Télécommunications, ESIEE Paris, Paris
Geneviève Baudoin
Dpt. Signal et Images, ENST Paris, Paris
Gérard Chollet

Authors

Jan Černocký
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Kopeček
View author publications
You can also search for this author in PubMed Google Scholar
Geneviève Baudoin
View author publications
You can also search for this author in PubMed Google Scholar
Gérard Chollet
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineerig, Faculty of Applied Sciences, University of West Bohemia in Plzeň, Universitní 22, 306 14, Pizeň, Czech Republic
Václav Matousek , Pavel Mautner & Jana Ocelíková , &
Department of Programming Systems and Communication, Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Petr Sojka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Černocký, J., Kopeček, I., Baudoin, G., Chollet, G. (1999). Very Low Bit Rate Speech Coding: Comparison of Data-Driven Units with Syllable Segments. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_48

Download citation

DOI: https://doi.org/10.1007/3-540-48239-3_48
Published: 01 October 1999
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics