AM-FM: Modulation and Demodulation Techniques

Part of the SpringerBriefs in Electrical and Computer Engineering book series


Analysis of speech signals is usually carried out using STFT. The most successful features currently being used in both speech recognition and speaker recognition systems are cepstral features. The cepstral features in one way or another are based on the source-filter model of speech production. However, it is well known that a significant part of the acoustic information cannot be modeled by the linear source-filter model. The source-filter model assumes that the sound source for the voiced speech is localized in the larynx and the vocal tract acts as a convolution filter for the emitted sound. Examples of phenomena not well-captured by the source-filter model include unstable airflow, turbulence and nonlinearities arising from oscillators with time-varying masses.


Amplitude Modulation Speech Signal Instantaneous Frequency Vocal Tract Speaker Identification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Rabiner LR, Shafer RW (1989) Digital signal processing of speech signals. Prentice-Hall, Englewood CliffsGoogle Scholar
  2. 2.
    Rao A, Kumaresan R (2000) On decomposing speech into modulated components. IEEE Trans Speech Audio Process 8(3):240–254CrossRefGoogle Scholar
  3. 3.
    Dimitriadis D, Maragos P (2003) Robust energy demodulation based on continuous models with application to speech recognition. In: Proceedings of EUROSPEECH’03, Geneva, pp 2853–2856Google Scholar
  4. 4.
    Maragos P, Kaiser JF, Quatieri TF (1993) Energy separation in signal modulations with application to speech analysis. IEEE Trans Signal Process 41(10):3024–3051zbMATHCrossRefGoogle Scholar
  5. 5.
    Teager HM (1980) Some observations on oral air flow during phonation. IEEE Trans Speech Audio Process 28(5):599–601CrossRefGoogle Scholar
  6. 6.
    Patterson RD (1987) A pulse ribbon model of monoaural phase perception. J Acoust Soc Am 82(5):1560–1586CrossRefGoogle Scholar
  7. 7.
    Paliwal K, Arslan L (2003) Usefulness of phase spectrum in human speech perception. In: Proceeding of EUROSPEECH’03, Geneva, pp 2117–2120Google Scholar
  8. 8.
    Paliwal K, Alsteris L (2005) On the usefulness of stft phase spectrum in human listening tests. Speech Commun 45(2):153–170CrossRefGoogle Scholar
  9. 9.
    Alsteris L, Paliwal K (2006) Further intelligibility results from human listening tests using the short-time phase spectrum. Speech Commun 48(6):727–736CrossRefGoogle Scholar
  10. 10.
    Loughlin PJ, Tacer B (1996) On the amplitude and frequency modulation decomposition of signals. J Acoust Soc Am 100(3):1594–1601CrossRefGoogle Scholar
  11. 11.
    Potamianos A, Maragos P (1996) Speech formant frequency and bandwidth tracking using multiband energy demodulation. J Acoust Soc Am 99(6):3795–3806CrossRefGoogle Scholar
  12. 12.
    Li G, Qiu L, Ng LK (2000) Signal representation based on instantaneous amplitude models with application to speech synthesis. IEEE Trans Speech Audio Process 8(3):353–357CrossRefGoogle Scholar
  13. 13.
    Dimitriadis V, Maragos P, Potamianos A (2005) Robust AM-FM features for speech recognition. IEEE Signal Process Lett 12(9):621–624CrossRefGoogle Scholar
  14. 14.
    Potamianos A, Maragos P (2001) Time-frequency distributions for automatic speech recognition. IEEE Trans Speech Audio Process 9(3):196–200CrossRefGoogle Scholar
  15. 15.
    Jankowski CR, Quatieri TF, Reynolds DA (1995) Measuring fine structure in speech: application to speaker identification. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, pp 325–328Google Scholar
  16. 16.
    Grimaldi M, Cummins F (2008) Speaker identification using instantaneous frequencies. IEEE Trans Audio Speech Lang Process 16(6):1097–1111Google Scholar
  17. 17.
    Lindemann E, Kates JM (1999) Phase relationships and amplitude envelopes in auditory perception. In: Proceedings of the IEEE workshop on applications of signal processing to audio and acouslics, New Paltz, New York, pp 17–20Google Scholar
  18. 18.
    Zeng FG, Nie K, Stickney GS, Kong YY, Vongphoe M, Bhargave A, Wei C, Cao K (2005) Speech recognition with amplitude and frequency modulations. Proc Natl Acad Sci U S A 102(7):2293–2298CrossRefGoogle Scholar
  19. 19.
    Saberi K, Hafter ER (1995) A common neural code for frequency and amplitude-modulated sounds. Nature 374:537–539CrossRefGoogle Scholar
  20. 20.
    Haykin S (1994) Communication systems. Wiley, New YorkGoogle Scholar
  21. 21.
    Boashash B (1992) Estimating and interpreting the instanteneous frequency of a signal-part 1: fundamentals. Proc IEEE 80(4):519–538Google Scholar
  22. 22.
    Potamianos A, Maragos P (1995) Speech formant frequency and bandwidth tracking using multiband energy demodulation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’95), pp 784–787Google Scholar
  23. 23.
    McAulay RJ, Quatieri TF (1986) Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans Acoustic Speech Signal Process 34:744–754CrossRefGoogle Scholar
  24. 24.
    Cohen L, Lee C (1992) Instantaneous bandwidth. In: Boashash B (ed) Time frequency signal analysis-methods and applications, Longman Cheshire, LondonGoogle Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  1. 1.Department of InstrumentationSGGS Institute of Engineering and TechnologyVishnupuri, NandedIndia
  2. 2.Department of E & TC EngineeringSRES College of EngineeringKopargaonIndia

Personalised recommendations