Experimental Evaluation of CNN Architecture for Speech Recognition

Haque, Md Amaan; Verma, Abhishek; Alex, John Sahaya Rani; Venkatesan, Nithya

doi:10.1007/978-981-15-0029-9_40

Md Amaan Haque¹⁹,
Abhishek Verma¹⁹,
John Sahaya Rani Alex¹⁹ &
…
Nithya Venkatesan¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1045))

1757 Accesses
14 Citations

Abstract

In recent days, deep learning has been widely used in signal and information processing. Among the deep learning algorithms, Convolution Neural Network (CNN) has been widely used for image recognition and classification because of its architecture, high accuracy and efficiency. This paper proposes a method that uses the CNN on audio samples rather than on the image samples in which the CNN method is usually used to train the model. The one-dimensional audio samples are converted into two-dimensional data that consists of matrix of Mel-Frequency Cepstral Coefficients (MFCCs) that are extracted from the audio samples and the number of windows used in the extraction. This proposed CNN model has been evaluated on the TIDIGITS corpus dataset. The paper analyzes different convolution layer architectures with different number of feature maps in each architecture. The three-layer convolution architecture was found to have the highest accuracy of 97.46% among the other discussed architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Sig. Process. Mag. 29(6), 82–97 (2012)
Article Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification(2014). arXiv preprint arXiv:1408.5882
Sainath, T.N., Parada, C.: Convolutional neural networks for small-footprint keyword spotting. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Palaz, D., Magimai-Doss, M., Collobert, R.: Analysis of cnn-based speech recognition system using raw speech as input. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Zhao, T., Zhao, Y., Chen, X.:. Time-frequency kernel-based CNN for speech recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Google Scholar
Tolias, G., Sicre, R., Jégou, H: Particular object retrieval with integral max-pooling of CNN activations (2015). arXiv preprint arXiv:1511.05879
Li, S., et al.: Shape driven kernel adaptation in convolutional neural network for robust facial traits recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Leonard, G., Doddington, G.: TIDIGITS LDC93S10. Web Download. Linguistic Data Consortium, Philadelphia (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Vellore Institute of Technology, Chennai, 600127, India
Md Amaan Haque, Abhishek Verma, John Sahaya Rani Alex & Nithya Venkatesan

Authors

Md Amaan Haque
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Verma
View author publications
You can also search for this author in PubMed Google Scholar
John Sahaya Rani Alex
View author publications
You can also search for this author in PubMed Google Scholar
Nithya Venkatesan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Sahaya Rani Alex .

Editor information

Editors and Affiliations

Department of Electrical and Communication Engineering, The Papua New Guinea University of Technology, Lae, Papua New Guinea
Ashish Kumar Luhach
Neumann János University, Kecskemét, Bács-Kiskun, Hungary
Janos Arpad Kosa
Amity University, Jaipur, Rajasthan, India
Ramesh Chandra Poonia
School of Computing, University of Eastern Finland, Kuopio, Finland
Xiao-Zhi Gao
Department of Computer Science, Namibia University of Science and Technology, Windhoek, Namibia
Dharm Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haque, M.A., Verma, A., Alex, J.S.R., Venkatesan, N. (2020). Experimental Evaluation of CNN Architecture for Speech Recognition. In: Luhach, A., Kosa, J., Poonia, R., Gao, XZ., Singh, D. (eds) First International Conference on Sustainable Technologies for Computational Intelligence. Advances in Intelligent Systems and Computing, vol 1045. Springer, Singapore. https://doi.org/10.1007/978-981-15-0029-9_40

Download citation

DOI: https://doi.org/10.1007/978-981-15-0029-9_40
Published: 02 November 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0028-2
Online ISBN: 978-981-15-0029-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics