International Journal of Speech Technology

, Volume 19, Issue 4, pp 731–742 | Cite as

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

  • Bhanu Priya
  • S. Dandapat


This study explores a novel subspace projection-based approach for analysis of stressed speech. Studies have shown that stress influences the speech production system and it results in a large acoustic variation between the neutral and the stressed speech. This degrades the discrimination capability of an automatic speech recognition system trained on neutral speech when tested on stressed speech. An effort is made to reduce the acoustic mismatch by explicitly normalizing the stress-specific attributes. The stress-specific divergences are normalized by exploiting the subspace filtering technique. To accomplish this, an orthogonal projection based linear relationship between the speech and the stress information has been explored to filter an effective speech subspace, which consists of speech information. Speech subspace is constructed using K-means clustering followed by singular value decomposition method using neutral speech data. The speech and the stress information are separated by projecting the stressed speech orthogonally onto an effective speech subspace. Experimental results indicate that, the bases of an effective subspace comprises the first few eigenvectors corresponding to the highest eigenvalues. To further improve the system performance, both the neutral and the stressed speech are projected onto the lower dimensional subspace. The projections derived using the neutral speech employs heteroscedastic linear discriminant analysis in maximum likelihood linear transformations-based semi-tied adaptation framework. Consistent improvements are noted for the proposed technique in all the discussed cases.


Stressed speech Stress normalization Orthogonal projection K-means clustering SVD Decorrelation HLDA Semi-tied adaptive training 


  1. Bou-Ghazale, S., & Hansen, J. (2000). A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Transactions on Speech and Audio Processing, 8(4), 429–442.CrossRefGoogle Scholar
  2. Chen, Y. (1988). Cepstral domain talker stress compensation for robust speech recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 36(4), 433–439.CrossRefMATHGoogle Scholar
  3. Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357–366.CrossRefGoogle Scholar
  4. Gales, M. (1999). Semi-tied covariance matrices for hidden Markov models. IEEE Transactions on Speech and Audio Processing, 7(3), 272–281.CrossRefGoogle Scholar
  5. Gangeh, M. J., Fewzee, P., Ghodsi, A., Kamel, M. S., & Karray, F. (2014). Multiview supervised dictionary learning in speech emotion recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(6), 1056–1068. doi: 10.1109/TASLP.2014.2319157.CrossRefGoogle Scholar
  6. Ghai, S., & Sinha, R. (2010). Exploring the effect of differences in the acoustic correlates of adults’ and children’s speech in the context of automatic speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2010, Article ID 318785.Google Scholar
  7. Hansen, J., & Clements, M. (1995). Source generator equalization and enhancement of spectral properties for robust speech recognition in noise and stress. IEEE Transactions on Speech and Audio Processing, 3(5), 407–415.CrossRefGoogle Scholar
  8. Hansen, J., & Varadarajan, V. (2009). Analysis and compensation of Lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 17(2), 366–378.CrossRefGoogle Scholar
  9. Hansen, J. H., & Patil, S. (2007). Speech under stress: Analysis, modeling and recognition. In C. Müller (Ed.), Speaker classification I (pp. 108–137). Berlin: Springer.CrossRefGoogle Scholar
  10. Hershey, J., & Olsen, P. (2007). Approximating the Kullback Leibler divergence between gaussian mixture models. In IEEE international conference on acoustics, speech and signal processing, 2007 (ICASSP 2007) (Vol. 4, pp. IV–317–IV–320).Google Scholar
  11. Jolliffe, I. T. (1986). Principal component analysis. Berlin: Springer.CrossRefMATHGoogle Scholar
  12. Kumar, N. (1997). Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition. Ph.D. Thesis, AAI9730738Google Scholar
  13. Nakos, G., & Joyner, D. (1998). Linear algebra with applications. Boston: Brooks/Cole.Google Scholar
  14. Priya, B., & Dandapat, S. (2015a). Linear transformation on speech subspace for analysis of speech under stress condition. In National conference on communications (NCC), Mumbai, India.Google Scholar
  15. Priya, B., & Dandapat, S. (2015b). Stressed speech analysis using sparse representation over temporal information based dictionary. In Annual IEEE India conference (INDICON), Jamia Millia Islamia, New Delhi.Google Scholar
  16. Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Upper Saddle River, NJ: Prentice Hall.MATHGoogle Scholar
  17. Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 737–746.CrossRefGoogle Scholar
  18. Shahnawazuddin, S., Kathania, H., & Sinha, R. (2015). Enhancing the recognition of children’s speech on acoustically mismatched ASR system. In textitIEEE TENCON: Proceedings.Google Scholar
  19. Shukla, S., Prasanna, S., & Dandapat, S. (2011). Stressed speech processing: Human vs automatic in non-professional speakers scenario. In National conference on communications (NCC) (pp. 1–5).Google Scholar
  20. Silva, J., & Narayanan, S. (2006). Average divergence distance as a statistical discrimination measure for hidden Markov models. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 890–906.CrossRefGoogle Scholar
  21. Song, P., Jin, Y., Zha, C., & Zhao, L. (2015). Speech emotion recognition method based on hidden factor analysis. Electronics Letters, 51(1), 112–114.CrossRefGoogle Scholar
  22. Stemmer, G., & Brugnara, F. (2006). Integration of heteroscedastic linear discriminant analysis (HLDA) into adaptive training. In IEEE International conference on acoustics, speech and signal processing (ICASSP 2006, Proceedings (Vol. 1, pp. I–I).Google Scholar
  23. Tahon, M., & Devillers, L. (2016). Towards a small set of robust acoustic features for emotion recognition: Challenges. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(1), 16–28. doi: 10.1109/TASLP.2015.2487051.CrossRefGoogle Scholar
  24. Wang, J. C., Chin, Y. H., Chen, B. W., Lin, C. H., & Wu, C. H. (2015). Speech emotion verification using emotion variance modeling and discriminant scale-frequency maps. IEEE Transactions on Audio, Speech, and Language Processing, 23(10), 1552–1562.CrossRefGoogle Scholar
  25. Wang, K., An, N., Li, B. N., Zhang, Y., & Li, L. (2015). Speech emotion recognition using Fourier parameters. IEEE Transactions on Affective Computing, 6(1), 69–75.CrossRefGoogle Scholar
  26. Womack, B., & Hansen, J. (1999). N-channel hidden Markov models for combined stressed speech classification and recognition. IEEE Transactions on Speech and Audio Processing, 7(6), 668–677.CrossRefGoogle Scholar
  27. Yun, S., & Yoo, C. D. (2012). Loss-scaled large-margin Gaussian mixture models for speech emotion classification. IEEE Transactions on Audio, Speech, and Language Processing, 20(2), 585–598. doi: 10.1109/TASL.2011.2162405.CrossRefGoogle Scholar
  28. Zao, L., Cavalcante, D., & Coelho, R. (2014). Time-frequency feature and AMS-GMM mask for acoustic emotion classification. IEEE Signal Processing Letters, 21(5), 620–624.CrossRefGoogle Scholar
  29. Zheng, W., Xin, M., Wang, X., & Wang, B. (2014). A novel speech emotion recognition method via incomplete sparse least square regression. IEEE Signal Processing Letters, 21(5), 569–572.CrossRefGoogle Scholar
  30. Zhou, G., Hansen, J. H., & Kaiser, J. F. (2001). Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3), 201–216.CrossRefGoogle Scholar
  31. Zong, Y., Zheng, W., Zhang, T., & Huang, X. (2016). Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Processing Letters, 23(5), 585–589.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Department of Electronics and Electrical EngineeringIndian Institute of Technology GuwahatiGuwahatiIndia

Personalised recommendations