Improvements on Automatic Speech Segmentation at the Phonetic Level

Gómez, Jon Ander; Calvo, Marcos

doi:10.1007/978-3-642-25085-9_66

Jon Ander Gómez¹⁸ &
Marcos Calvo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 7042))

Included in the following conference series:

Iberoamerican Congress on Pattern Recognition

2650 Accesses
3 Citations

Abstract

In this paper, we present some recent improvements in our automatic speech segmentation system, which only needs the speech signal and the phonetic sequence of each sentence of a corpus to be trained. It estimates a GMM by using all the sentences of the training subcorpus, where each Gaussian distribution represents an acoustic class, which probability densities are combined with a set of conditional probabilities in order to estimate the probability densities of the states of each phonetic unit. The initial values of the conditional probabilities are obtained by using a segmentation of each sentence assigning the same number of frames to each phonetic unit. A DTW algorithm fixes the phonetic boundaries using the known phonetic sequence. This DTW is a step inside an iterative process which aims to segment the corpus and re-estimate the conditional probabilities. The results presented here demonstrate that the system has a good capacity to learn how to identify the phonetic boundaries.

Download to read the full chapter text

Chapter PDF

Segmentation of Telephone Speech Based on Speech and Non-speech Models

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Automatic Phonetic Segmentation Using the Kaldi Toolkit

Keywords

References

Toledano, D.T., Hernández Gómez, L., Villarrubia Grande, L.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11(6), 617–625 (2003)
Article Google Scholar
Kipp, A., Wesenick, M.B., Schiel, F.: Pronunciation modelling applied to automatic segmentation of spontaneous speech. In: Proceedings of Eurospeech, Rhodes, Greece, pp. 2013–2026 (1997)
Google Scholar
Sethy, A., Narayanan, S.: Refined Speech Segmentation for Concatenative Speech Synthesis. In: Proceedings of ICSLP, Denver, Colorado, USA, pp. 149–152 (2002)
Google Scholar
Jarify, S., Pastor, D., Rosec, O.: Cooperation between global and local methods for the automatic segmentation of speech synthesis corpora. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1666–1669 (2006)
Google Scholar
Romsdorfer, H., Pfister, B.: Phonetic Labeling and Segmentation of Mixed-Lingual Prosody Databases. In: Proceedings of Interspeech, Lisbon, Portual, pp. 3281–3284 (2005)
Google Scholar
Paulo, S., Oliveira, L.C.: DTW-based Phonetic Alignment Using Multiple Acoustic Features. In: Proceedings of Eurospeech, Geneva, Switzerland, pp. 309–312 (2003)
Google Scholar
Park, S.S., Shin, J.W., Kim, N.S.: Automatic Speech Segmentation with Multiple Statistical Models. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 2066–2069 (2006)
Google Scholar
Mporas, I., Ganchev, T., Fakotakis, N.: Speech segmentation using regression fusion of boundary predictions. Computer Speech and Language 24, 273–288 (2010)
Article Google Scholar
Povey, D., Woodland, P.C.: Minimum Phone Error and I-smoothing for improved discriminative training. In: Proceedings of ICASSP, Orlando, Florida, USA, pp. 105–108 (2002)
Google Scholar
Kuo, J.W., Wang, H.M.: Minimum Boundary Error Training for Automatic Phonetic Segmentation. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1217–1220 (2006)
Google Scholar
Huggins-Daines, D., Rudnicky, A.I.: A Constrained Baum-Welch Algorithm for Improved Phoneme Segmentation and Efficient Training. In: Proceedings of Interspeech, Pittsburgh, Pennsylvania, USA, pp. 1205–1208 (2006)
Google Scholar
Ogbureke, K.U., Carson-Berndsen, J.: Improving initial boundary estimation for HMM-based automatic phonetic segmentation. In: Proceedings of Interspeech, Brighton, UK, pp. 884–887 (2009)
Google Scholar
Gómez, J.A., Castro, M.J.: Automatic Segmentation of Speech at the Phonetic Level. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 672–680. Springer, Heidelberg (2002)
Chapter Google Scholar
Gómez, J.A., Sanchis, E., Castro-Bleda, M.J.: Automatic Speech Segmentation Based on Acoustical Clustering. In: Hancock, E.R., Wilson, R.C., Windeatt, T., Ulusoy, I., Escolano, F. (eds.) SSPR&SPR 2010. LNCS, vol. 6218, pp. 540–548. Springer, Heidelberg (2010)
Chapter Google Scholar
Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin Speech Database: Design of the Phonetic Corpus. In: Proceedings of Eurospeech, Berlin, Germany, vol. 1, pp. 653–656 (September 1993)
Google Scholar
TIMIT Acoustic-Phonetic Continuous Speech Corpus, National Institute of Standards and Technology Speech Disc 1-1.1, NTIS Order No. PB91-5050651996 (October 1990)
Google Scholar

Download references

Author information

Authors and Affiliations

Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Spain
Jon Ander Gómez & Marcos Calvo

Authors

Jon Ander Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Marcos Calvo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Universidad de La Frontera, Avda. Francisco Salazar, 01145, Temuco, Chile
César San Martin
Myongji University, San 38-2, Namdong, 449-728, Cheoingu, Yongin, Republic of Korea
Sang-Woon Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gómez, J.A., Calvo, M. (2011). Improvements on Automatic Speech Segmentation at the Phonetic Level. In: San Martin, C., Kim, SW. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2011. Lecture Notes in Computer Science, vol 7042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25085-9_66

Download citation

DOI: https://doi.org/10.1007/978-3-642-25085-9_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25084-2
Online ISBN: 978-3-642-25085-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Improvements on Automatic Speech Segmentation at the Phonetic Level

Abstract

Chapter PDF

Similar content being viewed by others

Segmentation of Telephone Speech Based on Speech and Non-speech Models

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Automatic Phonetic Segmentation Using the Kaldi Toolkit

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Improvements on Automatic Speech Segmentation at the Phonetic Level

Abstract

Chapter PDF

Similar content being viewed by others

Segmentation of Telephone Speech Based on Speech and Non-speech Models

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Automatic Phonetic Segmentation Using the Kaldi Toolkit

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation