Class Confusability Reduction in Audio-Visual Speech Recognition Using Random Forests
This paper presents an audio-visual speech classification system based on Random Forests classifiers, aiming to reduce the intra-class misclassification problems, which is a very usual situation, specially in speech recognition tasks. A novel training procedure is proposed, introducing the concept of Complementary Random Forests (CRF) classifiers. Experimental results over three audio-visual databases, show that a good performance is achieved with the proposed system for the different types of input information considered, viz., audio-only information, video-only information and fused audio-video information. In addition, these results also indicate that the proposed method performs satisfactorily over the three databases using the same configuration parameters.
KeywordsSpeech recognition Audio-visual speech Random forests
- 1.Advanced Multimedia Processing Laboratory. Carnegie Mellon University, Pittsburgh, PA. http://chenlab.ece.cornell.edu/projects/AudioVisualSpeechProcessing/
- 3.Digital Signal Processing Group, Rice University: NOISEX-92 Database, Houston, RiceGoogle Scholar
- 7.Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.: Multimodal deep learning. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 689–696 (2011)Google Scholar
- 9.Terissi, L.D., Sad, G.D., Gómez, J.C., Parodi, M.: Audio-visual speech recognition scheme based on wavelets and random forests classification. In: Pardo, A., Kittler, J. (eds.) CIARP 2015. LNCS, vol. 9423, pp. 567–574. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25751-8_68 CrossRefGoogle Scholar