Skip to main content
Log in

Low bit-rate speech coding based on multicomponent AFM signal model

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel multicomponent amplitude and frequency modulated (AFM) signal model for parametric representation of speech phonemes. An efficient technique is developed for parameter estimation of the proposed model. The Fourier–Bessel series expansion is used to separate a multicomponent speech signal into a set of individual components. The discrete energy separation algorithm is used to extract the amplitude envelope (AE) and the instantaneous frequency (IF) of each component of the speech signal. Then, the parameter estimation of the proposed AFM signal model is carried out by analysing the AE and IF parts of the signal component. The developed model is found to be suitable for representation of an entire speech phoneme (voiced or unvoiced) irrespective of its time duration, and the model is shown to be applicable for low bit-rate speech coding. The symmetric Itakura–Saito and the root-mean-square log-spectral distance measures are used for comparison of the original and reconstructed speech signals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bouguelia, M. R., Nowaczyk, S., Santosh, K. C., & Verikas, A. (2018). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9, 1307–1319. https://doi.org/10.1007/s13042-017-0645-0.

    Article  Google Scholar 

  • Bradbury, J. (2000). Linear predictive coding. http://my.fit.edu/~vkepuska/ece5525/lpc_paper.pdf.

  • Chu, W. C. (2004). Speech coding algorithms: Foundation and evolution of standardized coders. Hoboken, NJ: Wiley.

    MATH  Google Scholar 

  • Equipments, T. (1990). 40, 32, 24, 16 kbit/s adaptive differential pulse code modulation (adpcm). ITU-T Recommendation, G, 726:59.

  • Furui, S., & Sondhi, M. M. (1991). Advances in speech signal processing. New York: Marcel Dekker.

    MATH  Google Scholar 

  • Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.

    Google Scholar 

  • George, E. B., & Smith, M. J. T. (1997). Speech analysis/synthesis and modification using an analysis-by-synthesis/overlap-add sinusoidal model. IEEE Transactions on Speech and Audio Processing, 5(5), 389–406.

    Article  Google Scholar 

  • Gray, A., & Markel, J. (1976). Distance measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(5), 380–391.

    Article  Google Scholar 

  • Hood, A. S., Pachori, R. B., Reddy, V. K., & Sircar, P. (2015). Parametric representation of speech employing multi-component AFM signal model. The International Journal of Speech Technology, 18(3), 287–303.

    Article  Google Scholar 

  • Jayant, N. S., & Noll, P. (1984). Digital coding of waveforms: Principles and applications to speech and video. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Kay, S. M. (1988). Modern spectral estimation. Englewood Cliffs, NJ: Prentice Hall.

    MATH  Google Scholar 

  • Kroon, P., & Deprettere, E. F. (1988). A class of analysis-by-synthesis predictive coders for high quality speech coding at rates between 4.8 and 16 kbit/s. IEEE Journal on Selected Areas in Communications, 6(2), 353–363.

    Article  Google Scholar 

  • Maragos, P., Kaiser, J. F., & Quatieri, T. F. (1993a). Energy separation in signal modulations with application to speech analysis. IEEE Transactions on Signal Processing, 41(10), 3024–3051.

    Article  MATH  Google Scholar 

  • Maragos, P., Kaiser, J. F., & Quatieri, T. F. (1993b). On amplitude and frequency demodulation using energy operators. IEEE Transactions on Signal Processing, 41(4), 1532–1550.

    Article  MATH  Google Scholar 

  • McAulay, R. J., & Quatieri, T. F. (1984). Magnitude-only reconstruction using a sinusoidal speech model. In Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP 1984) (pp. 441–444).

  • McAulay, R. J., & Quatieri, T. F. (1986). Speech analysis/synthesis based on a sinusoidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4), 744–754.

    Article  Google Scholar 

  • McAulay, R. J., & Quatieri, T. F. (1990). Pitch estimation and voicing detection based on a sinusoidal speech model. In Proceedings of the IEEE international conference on acoustics, speech and signal processing, (ICASSP 1990) (pp. 249–252).

  • McAulay, R. J., & Quatieri, T. F. (1992). Low-rate speech coding based on the sinusoidal model. In S. Furui & M. M. Sondhi (Eds.), Advances in speech signal processing. New York: Marcel Dekker. chap 6.

    Google Scholar 

  • Mowlaee, P., Christensen, M. G., & Jensen, S. H. (2011). New results on single-channel speech separation using sinusoidal modeling. IEEE Transactions on Audio, Speech, and Language Processing, 19(5), 1265–1277.

    Article  Google Scholar 

  • Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. The International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9525-6.

  • Pachori, R. B., & Sircar, P. (2006). Speech analysis using Fourier-Bessel expansion and discrete energy separation algorithm. In 12th Digital Signal Processing Workshop, 4th Signal Processing Education Workshop (pp. 423–428). IEEE.

  • Pachori, R. B., & Sircar, P. (2010). Analysis of multicomponent AM-FM signals using FB-DESA method. Digital Signal Processing, 20(1), 42–62.

    Article  Google Scholar 

  • Potamianos, A., & Maragos, P. (1999). Speech analysis and synthesis using an AM-FM modulation model. Speech Communication, 28(3), 195–209.

    Article  Google Scholar 

  • Quatieri, T. F., & Danisewicz, R. G. (1990). An approach to co-channel talker interference suppression using a sinusoidal model for speech. IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(1), 56–69.

    Article  Google Scholar 

  • Rabiner, L. R., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice Hall.

    Google Scholar 

  • Ramamohan, S., & Dandapat, S. (2006). Sinusoidal model-based analysis and classification of stressed speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 737–746.

    Article  Google Scholar 

  • Recommendation G. (1988). Pulse code modulation (PCM) of voice frequencies. Geneva: ITU.

  • Schroeder, J. (1993). Signal processing via Fourier-Bessel series expansion. Digital Signal Processing, 3(2), 112–124.

    Article  MathSciNet  Google Scholar 

  • Schroeder, M., & Atal, B. (1985). Code-excited linear prediction (CELP): High-quality speech at very low bit rates. In Acoustics, speech, and signal processing, IEEE international conference on ICASSP’85 (Vol. 10, pp. 937–940). IEEE.

  • Sircar, P., & Saini, R. K. (2007). Parametric modeling of speech by complex AM and FM signals. Digital Signal Processing, 17(6), 1055–1064.

    Article  Google Scholar 

  • Sircar, P., & Sharma, S. (1997). Complex FM signal model for non-stationary signals. Signal Processing, 57(3), 283–304.

    Article  MATH  Google Scholar 

  • Sircar, P., & Syali, M. S. (1996). Complex AM signal model for non-stationary signals. Signal Processing, 53(1), 35–45.

    Article  MATH  Google Scholar 

  • Spanias, A. S. (1994). Speech coding: A tutorial review. Proceedings of the IEEE, 82(10), 1541–1582.

    Article  Google Scholar 

  • Tabet, Y., Boughazi, M., & Afifi, S. (2018). Speech analysis and synthesis with a refined adaptive sinusoidal representation. The International Journal of Speech Technology. https://doi.org/10.1007/s10772-018-9519-4.

  • Wei, B., & Gibson, J. D. (2001). Comparison of distance measures in discrete spectral modeling. Master’s thesis, Southern Methodist University, Dallas, TX.

  • Zliobaite, I., Bifet, A., Pfahringer, B., & Holmes, G. (2014). Active learning with drifting streaming data. IEEE Transactions on Neural Networks and Learning Systems, 25(1), 27–39.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohan Bansal.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bansal, M., Sircar, P. Low bit-rate speech coding based on multicomponent AFM signal model. Int J Speech Technol 21, 783–795 (2018). https://doi.org/10.1007/s10772-018-9542-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9542-5

Keywords

Navigation