Unsupervised Speech Unit Discovery Using K-means and Neural Networks

Manenti, Céline; Pellegrini, Thomas; Pinquier, Julien

doi:10.1007/978-3-319-68456-7_14

Unsupervised Speech Unit Discovery Using K-means and Neural Networks

Céline Manenti¹⁶,
Thomas Pellegrini¹⁶ &
Julien Pinquier¹⁶

Conference paper
First Online: 27 September 2017

730 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Abstract

Unsupervised discovery of sub-lexical units in speech is a problem that currently interests speech researchers. In this paper, we report experiments in which we use phone segmentation followed by clustering the segments together using k-means and a Convolutional Neural Network. We thus obtain an annotation of the corpus in pseudo-phones, which then allows us to find pseudo-words. We compare the results for two different segmentations: manual and automatic. To check the portability of our approach, we compare the results for three different languages (English, French and Xitsonga). The originality of our work lies in the use of neural networks in an unsupervised way that differ from the common method for unsupervised speech unit discovery based on auto-encoders. With the Xitsonga corpus, for instance, with manual and automatic segmentations, we were able to obtain 46% and 42% purity scores, respectively, at phone-level with 30 pseudo-phones. Based on the inferred pseudo-phones, we discovered about 200 pseudo-words.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Towards spoken term discovery at scale with zero resources. In: INTERSPEECH, pp. 1676–1679. International Speech Communication Association (2010)
Google Scholar
Badino, L., Canevari, C., Fadiga, L., Metta, G.: An auto-encoder based approach to unsupervised learning of subword units. In: ICASSP, pp. 7634–7638 (2014)
Google Scholar
Badino, L.: Phonetic context embeddings for DNN-HMM phone recognition. In: Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8–12, pp. 405–409 (2016)
Google Scholar
Church, K.W., Helfman, J.I.: Dotplot: a program for exploring self-similarity in millions of lines of text and code. J. Comput. Graph. Stat. 2(2), 153–174 (1993)
Google Scholar
van Heerden, C., Davel, M., Barnard, E.: The semi-automated creation of stratified speech corpora (2013)
Google Scholar
Kiesling, S., Dilley, L., Raymond, W.D.: The variation in conversation (vic) project: creation of the buckeye corpus of conversational speech. In: Language Variation and Change, pp. 55–97 (2006)
Google Scholar
Lyzinski, V., Sell, G., Jansen, A.: An evaluation of graph clustering methods for unsupervised term discovery. In: INTERSPEECH, pp. 3209–3213. ISCA (2015)
Google Scholar
Manenti, C., Pellegrini, T., Pinquier, J.: CNN-based phone segmentation experiments in a less-represented language (regular paper). In: INTERSPEECH, p. 3549. ISCA (2016)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book MATH Google Scholar
Muscariello, A., Bimbot, F., Gravier, G.: Unsupervised Motif acquisition in speech via seeded discovery and template matching combination. IEEE Trans. Audio Speech Lang. Process. 20(7), 2031–2044 (2012). https://doi.org/10.1109/TASL.2012.2194283
Article Google Scholar
Park, A.S., Glass, J.R.: Unsupervised pattern discovery in speech. IEEE Trans. Audio Speech Lang. Process. 16(1), 186–197 (2008)
Article Google Scholar
Pitt, M., Dilley, L., Johnson, K., Kiesling, S., Raymond, W., Hume, E., Fosler-Lussier, E.: Buckeye corpus of conversational speech (2nd release) (2007). www.buckeyecorpus.osu.edu
Renshaw, D., Kamper, H., Jansen, A., Goldwater, S.: A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge. In: INTERSPEECH, pp. 3199–3203 (2015)
Google Scholar
Tian, F., Gao, B., Cui, Q., Chen, E., Liu, T.Y.: Learning deep representation for graph clustering, pp. 1293–1299 (2014)
Google Scholar
Versteegh, M., Thiollire, R., Schatz, T., Cao, X.N., Anguera, X., Jansen, A., Dupoux, E.: The zero resource speech challenge 2015. In: INTERSPEECH, pp. 3169–3173 (2015)
Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: ICML, pp. 1096–1103. ACM (2008)
Google Scholar
Wang, H., Lee, T., Leung, C.C.: Unsupervised spoken term detection with acoustic segment model. In: Speech Database and Assessments (Oriental COCOSDA), pp. 106–111. IEEE (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

IRIT, Université de Toulouse, UPS, Toulouse, France
Céline Manenti, Thomas Pellegrini & Julien Pinquier

Authors

Céline Manenti
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Pellegrini
View author publications
You can also search for this author in PubMed Google Scholar
Julien Pinquier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Céline Manenti .

Editor information

Editors and Affiliations

University of Le Mans, Le Mans, France
Nathalie Camelin
University of Le Mans, Le Mans, France
Yannick Estève
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manenti, C., Pellegrini, T., Pinquier, J. (2017). Unsupervised Speech Unit Discovery Using K-means and Neural Networks. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-68456-7_14
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics