Speech Phoneme Classification by Intelligent Decision-Level Fusion

Laleye, Fréjus A. A.; Ezin, Eugène C.; Motamed, Cina

doi:10.1007/978-3-319-31898-1_4

Fréjus A. A. Laleye^5,6,
Eugène C. Ezin⁵ &
Cina Motamed⁶

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 383))

602 Accesses
1 Citations

Abstract

This paper explores the decision fusion for the phoneme recognition problem through intelligent combination of Naive Bayes and Learning Vector Quantization (LVQ) classifiers and feature fusion using Mel-frequency Cepstral Coefficients (MFCC), Relative Spectral Transform—Perceptual Linear Prediction (Rasta-PLP) and Perceptual Linear Prediction (PLP). This work emphasizes optimal decision making from decisions of classifiers which are trained on different features. The proposed architecture consists of three decision fusion approaches which are weighted mean, deep belief networks (DBN) and fuzzy logic. We proposed a performance comparison on a dataset of an African language phoneme, Fongbe, for experiments. The latter produced the overall decision fusion performance with the proposed approach using fuzzy logic whose classification accuracies are 95.54 % for consonants and 83.97 % for vowels despite the lower execution time of Deep Belief Networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ager, M., Cvetkovic, Z., Sollich, P.: Phoneme classification in high-dimensional linear feature domains. Comput. Res. Repository (2013)
Google Scholar
Agoli-Agbo, E.O., Bernard, C.: Les particules nonciatives du fon. Institut national des langues et civilisations orientales, Paris, 1st edition (2009)
Google Scholar
Akoha, A.B.: Syntaxe et lexicologie du fon-gbe: Bénin. Ed. L’harmattan, p. 368 (2010)
Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems (2006)
Google Scholar
Borne, P., Benrejeb, M., Haggege, J.: Les rseaux de neurones, présentation et applications. TECHNIP Editions, p. 90 (2007)
Google Scholar
Cho, S.-B., Kim, J.: Combining multiple neural networks by fuzzy integral and robust classification. IEEE Trans. Syst. Man Cybern. 380–384 (1995)
Google Scholar
Corradini, A., Mehta, M., Bernsen, N., Martin, J., Abrilian, S.: Multimodal input fusion in humancomputer interaction. In: NATO-ASI Conference on Data Fusion for Situation Monitoring, Incident Detection, Alert and Response Management (2003)
Google Scholar
Esposito, A., Ezin, E., Ceccarelli, M.: Preprocessing and neural classification of english stop consonants [b, d, g, p, t, k]. In: The 4th International Conference on Spoken Language Processing, pp. 1249–1252. Philadelphia (1996)
Google Scholar
Esposito, A., Ezin, E., Ceccarelli, M.: Phoneme classification using a rasta-PLP preprocessing algorithm and a time delay neural network: performance studies. In: Proceedings of the 10th Italian Workshop on Neural Nets, pp. 207–217. Salerno (1998)
Google Scholar
Foucher, S., Laliberte, F., Boulianne, G., Gagnon, L.: A dempster-shafer based fusion approach for audio-visual speech recognition with application to large vocabulary french speech. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1 (2006)
Google Scholar
Genussov, M., Lavner, Y., Cohen, I.: Classification of unvoiced fricative phonemes using geometric methods. In: 12th International Workshop on Acoustic Echo and Noise Control. Tel-Aviv, Israel (2010)
Google Scholar
Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18, 1527–1554 (2006)
Google Scholar
Iyengar, G., Nock, H., Neti., C.: Audio-visual synchrony for detection of monologue in video archives. In: IEEE International Conference on Multimedia and Expo, vol. 1, pp. 329–332 (2003)
Google Scholar
Jacobs, R.: Methods for combining experts’s probability assessments. Neural Comput. 867–888 (1995)
Google Scholar
Jacobs, R., Jordan, M., Nowlan, S., Hinton, G.: Adaptive mixture of local experts. Neural Comput. 79–87 (1991)
Google Scholar
Kittler, J., Hatef, M., Duin, R., Matas, J.: On combining classifiers. IEEE Trans. Patt. Anal. Mach. Intell. 226–239 (1998)
Google Scholar
Kohonen, T.: An introduction to neural computing. Neural Netw. 1, 3–16 (1988)
Google Scholar
Laleye, F.A.A., Ezin, E.C., Motamed, C.: Weighted combination of naive bayes and lvq classifier for fongbe phoneme classification. In: Tenth International Conference on Signal-Image Technology and Internet-Based Systems, pp. 7–13, Marrakech. IEEE (2014)
Google Scholar
Le, V.-B., Besacier, L.: Automatic speech recognition for under-resourced languages: Application to vietnamese language. In: IEEE Transactions on Audio, Speech, and Language Processing, pp. 1471–1482. IEEE (2009)
Google Scholar
Lefebvre, C., Brousseau, A.: A grammar of fonge, de gruyter mouton, p. 608 (2001)
Google Scholar
Lewis, T.W., Powers., D.M.: Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier. Int. Symp. Intell. Multimedia Video Speech Process. 1, 551–554 (2001)
Google Scholar
Lung, J.W.J., Salam, M.S.H., Rehman, A., Rahim, M.S.M., Saba, T.: Fuzzy phoneme classification using multi-speaker vocal tract length normalization. IETE Technical Review, London, 2nd edn (2014)
Google Scholar
Malcangi, M., Ouazzane, K., Patel, K.: Audio-visual fuzzy fusion for robust speech recognition. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, Dallas (2013)
Google Scholar
Metallinou, A., Lee, S., Narayanan, S.: Decision level combination of multiple modalities for recognition and analysis of emotional expression. In: IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 2462–24665 (2010)
Google Scholar
Meyer, G., Mulligan, J., Wuerger, S.: Continuous audio-visual digit recognition using n-best decision fusion. Inf. Fusion 5, 91–101 (2004)
Google Scholar
Mugler, E.M., Patton, J.L., Flint, R.D., Wright, Z.A., Schuele, S.U., Rosenow, J., Shih, J.J., Krusienski, D.J., Slutzky, M.W.: Direct classification of all american english phonemes using signals from functional speech motor cortex. J. Neural Eng. (2014)
Google Scholar
Neti, C., Maison, B., Senior, A., Iyengar, G., Decuetos, P., Basu, S., Verma., A.: Joint processing of audio and visual information for multimedia indexing and human-computer interaction. In: Sixth International Conference RIAO, pp. 294–301. France, Paris (2000)
Google Scholar
Niesler, T., Louw, P.H.: Comparative phonetic analysis and phoneme recognition for afrikaans, english and xhosa using the african speech technology telephone speech database. S. Afr. Comput. J. 3–12 (2004)
Google Scholar
O’Connor, P., Neil, D., SC, L., Delbruck, T., Pfeiffer, M.: Real-time classification and sensor fusion with a spiking deep belief network. Front. Neurosci. (2013)
Google Scholar
Palaz, D., Collobert, R., Magimai.-Doss, M.: End-to-end phoneme sequence recognition using convolutional neural networks. Idiap-RR (2013)
Google Scholar
Pfleger, N.: Context based multimodal fusion. In: ACM International Conference on Multimodal Interfaces, pp. 265–272 (2004)
Google Scholar
Pitsikalis, V., Katsamanis, A., Papandreou, G., Maragos, P.: Adaptive multimodal fusion by uncertainty compensation. In: Ninth International Conference on Spoken Language Processing, vol. 7, pp. 423–435. Pittsburgh (2006)
Google Scholar
Rogova, G.: Combining the results of several neural networks classifiers. Neural Netw. 777–781 (1994)
Google Scholar
Schlippe, T., Djomgang, E.G.K., Vu, N.T., Ochs, S., Schultz, T: Hausa large vocabulary continuous speech recognition. In: The third International Workshop on Spoken Languages Technologies for Under-resourced Languages. Cape-Town (2012)
Google Scholar
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. IEEE Symp. Comput. Intell. Data Min. 324–331 (2009)
Google Scholar
Xu, H., Chua, T.: Fusion of av features and external information sources for event detection in team sports video. ACM Trans. Multimedia Comput. Commun. Appl. 2, 44–67 (2006)
Google Scholar
Yousafzai, J., Cvetkovic, Z., Sollich, P.: Tuning support vector machines for robust phoneme classification with acoustic waveforms. In: 10th Annual conference of the International Speech Communication Association, pp. 2359–2362. England (2009). ISCA-INST SPEECH COMMUNICATION ASSOC
Google Scholar
Zhang, H.: Exploring conditions for the optimality of nave bayes. IJPRAI 19, 183–198 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Unité de Recherche en Informatique et Sciences Appliquées, Institut de Mathématiques et de Sciences Physiques, Université d’Abomey-Calavi, BP 613 Porto-Novo, Abomey Calavi, Benin
Fréjus A. A. Laleye & Eugène C. Ezin
Laboratoire d’Informatique Signal et Image de la Côte d’Opale, Université du Littoral Côte d’Opale, 50 rue F. Buisson, BP 719, 62228, Calais Cedex, France
Fréjus A. A. Laleye & Cina Motamed

Authors

Fréjus A. A. Laleye
View author publications
You can also search for this author in PubMed Google Scholar
Eugène C. Ezin
View author publications
You can also search for this author in PubMed Google Scholar
Cina Motamed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cina Motamed .

Editor information

Editors and Affiliations

Polytechnic Institute of Setúbal, Setubal, Portugal
Joaquim Filipe
Vitry sur Seine, France
Kurosh Madani
Ford Research & Adv. Engineering, Dearborn, Michigan, USA
Oleg Gusikhin
Mechanical and Aerospace Engineering, Carleton University, Ottawa, Ontario, Canada
Jurek Sasiadek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Laleye, F.A.A., Ezin, E.C., Motamed, C. (2016). Speech Phoneme Classification by Intelligent Decision-Level Fusion. In: Filipe, J., Madani, K., Gusikhin, O., Sasiadek, J. (eds) Informatics in Control, Automation and Robotics 12th International Conference, ICINCO 2015 Colmar, France, July 21-23, 2015 Revised Selected Papers. Lecture Notes in Electrical Engineering, vol 383. Springer, Cham. https://doi.org/10.1007/978-3-319-31898-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-31898-1_4
Published: 15 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31896-7
Online ISBN: 978-3-319-31898-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics