Advertisement

Introduction

Chapter
  • 548 Downloads
Part of the SpringerBriefs in Electrical and Computer Engineering book series

Abstract

For several decades, the traditional approach to speech modeling has been the linear (source-filter) model where the true nonlinear physics of speech production is approximated via the standard assumptions of linear acoustics and one-dimensional plane wave propagation of the sound in the vocal tract. The linear model has been applied with limited success to applications like speech coding, synthesis and recognition. However, to build successful applications, deviations from the linear model are often modeled as second-order effects or error terms. There are strong theoretical and experimental evidences for the existence of important nonlinear aerodynamic phenomena during the speech production that cannot be accounted for by the linear model. Thus, the linear model can be viewed only as a first-order approximation to the true speech acoustics which also contains second-order and nonlinear structure.

Keywords

Speech Signal Vocal Fold Speech Production Vocal Tract Speaker Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Barney A, Shadle CH, Davies P (1999) Fluid flow in a dynamical mechanical model of the vocal folds and tract: part 1 and 2. J Acoust Soc Am 105(1):444–466Google Scholar
  2. 2.
    Richard G, Sinder D, Duncan H, Lin Q, Flanagan J, Levinson S, Krane M, Slimon S, Davis D (1995) Numerical simulation of fluid flow in the vocal tract. In: Proceedings of Eurospeech, MadridGoogle Scholar
  3. 3.
    McGowan RS (1988) An aeroacoustics approach to phonation. J Acoust Soc Am 83(2):696–704CrossRefGoogle Scholar
  4. 4.
    Thomas TJ (1986) A finite element model of fluid flow in the vocal tract. Comput Speech Lang 1:131–151CrossRefGoogle Scholar
  5. 5.
    Kaiser JF (1983) Some observations on vocal tract operation from a fluid flow point of view. In: Titze IR, Scherer RC (eds) Vocal fold physiology: biomechanics, acoustics, and phonatory control. Denver Center for the Performing Arts, Denver, pp 358–386Google Scholar
  6. 6.
    Teager HM, Teager SM (1989) Evidence for nonlinear sound production mechanisms in the vocal tract. In: Hardcastle W, Marchal A (eds) Speech production and speech modeling, vol 55. NATO Advanced Study Institute Series D, BonasGoogle Scholar
  7. 7.
    Mclaughlin S, Maragos P (2007) Nonlinear methods for speech analysis and synthesis. In: Marshall S, Sicuranza GL (eds) Advances in nonlinear signal and image processing. Hindawi Publishing Corporation, New YorkGoogle Scholar
  8. 8.
    Brookes DM, Naylor PA (1988) Speech production modelling with variable glottal reflection coefficient. In: Proeedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP’88), vol 1, pp 671–674, New York, NYGoogle Scholar
  9. 9.
    Steinecke I, Herzel H (1995) Bifurcations in an asymmetric vocal-fold model. J Acoust Soc Am 97(3):1874–1884CrossRefGoogle Scholar
  10. 10.
    Ishizaka K, Flanagan JL (1972) Synthesis of voiced sounds from a two-mass model of the vocal chords. Bell Syst Tech J 51(6):1233–1268Google Scholar
  11. 11.
    Koizumi T, Taniguchi S, Hiromitsu S (1987) Two-mass models of the vocal cords for natural sounding voice synthesis. J Acoust Soc Am 82(4):1179–1192CrossRefGoogle Scholar
  12. 12.
    Schoentgen J (1990) Non-linear signal representation and its application to the modelling of the glottal waveform. Speech Commun 9(3):189–201CrossRefGoogle Scholar
  13. 13.
    Schoentgen J (1992) Glottal waveform synthesis with volterra shaping functions. Speech Commun 11(6):499–512CrossRefGoogle Scholar
  14. 14.
    Hegerl GC, Hoge H (1991) Numerical simulation of the glottal flow by a model based on the compressible Navier-Stokes equations. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing (ICASSP’91), vol 1, Toronto, ON, pp 477–480Google Scholar
  15. 15.
    Fandez-Zanuyand M, Kubin G, Kleijn WB, Maragos P, McLaughlin S, Esposito A, Hussain A, Schoentgen J (2002) Nonlinear speech processing: overview and applications. Control Intel Syst 30:1–10Google Scholar
  16. 16.
    Maragos P, Kaiser JF, Quatieri TF (1993) Energy separation in signal modulations with application to speech analysis. IEEE Trans Signal Process 41(10):3024–3051zbMATHCrossRefGoogle Scholar
  17. 17.
    Maragos P (1994) A time slope domain theory of morphological systems: slope transforms and max-min dynamics. In: EUSIPCO’94, pp 971–974Google Scholar
  18. 18.
    Dorst L, Boomgaard R (1993) An analytical theory of mathematical morphology. In: Proceedings international workshop on mathematical morphology and its applications to signal processing, BarcelonaGoogle Scholar
  19. 19.
    Arce G, Gallagher N (1982) State description for the root-signal set of median filters. IEEE Trans Acoust Speech Signal Process 30(6):894–902Google Scholar
  20. 20.
    Nathan KS, Silverman HF (1994) Time-varying feature selection and classification of unvoiced stop consonants. IEEE Trans Speech Audio Process 2(3):395–405Google Scholar
  21. 21.
    Hanson HM, Maragos P, Potamianos A (1993) Finding speech formants and modulations via energy separation: with application to a vocoder. In: Proceedings of IEEE international conference on acoustics, speech, and, signal processing (ICASSP’93), vol 11, pp 716–719Google Scholar
  22. 22.
    Hussain A, Campbell DR (1998) Binaural sub-band adaptive speech enhancement using artificial neural networks. EURASIP J Speech Commun (Special Issue on Robust Speech Recognition for Unknown Communication Channels) 25:177–186Google Scholar
  23. 23.
    Hussain A (1999) Multi-sensor neural network processing of noisy speech. Int J Neural Syst 9(5):467–472CrossRefGoogle Scholar
  24. 24.
    Knecht WG, Schenkel ME, Moschytz GS (1995) Neural network filters for speech enhancement. IEEE Trans Speech Audio Proc 3(6):433–438Google Scholar
  25. 25.
    Hienle F, Rabenstein R, Stenger A (1997) Measuring the linear and non-linear properties of electro-acoustic transmission systems. In: Proceedings of international workshop on acoustics, echo and noise cancellation (IWAENC’97), London, pp 33–36Google Scholar
  26. 26.
    Yegnanarayana B, Reddy KS, Kishore SP (2001) Source and system features for speaker recognition using AANN models. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, Salt Lake city, UT, pp 409–412Google Scholar
  27. 27.
    Yegnanarayana B, Prasanna SRM, Zachariach JM, Gupta SC (2005) Combining evidences from source, suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Trans Speech Audio Process 13(4):575–582CrossRefGoogle Scholar
  28. 28.
    Deshpande MS, Holambe RS (2009) Speaker identification based on robust am-fm features. In: Proceedings of second IEEE international conference on emerging trends in engineering and technology (ICETET’09), Nagpur, pp 880–884Google Scholar
  29. 29.
    Deshpande MS, Holambe RS (2011) Robust speaker identification in presence of car noise. Int J Biometrics 3:189–205Google Scholar
  30. 30.
    Quatieri T (2002) Nonlinear auditory modeling as a basis for speaker recognition. Technical report, MIT Lincoln Laboratory, LexingtonGoogle Scholar
  31. 31.
    Lu S, Doerschuk PC (1995) Nonlinear modeling and processing of speech with applications to speech coding. Technical report, Purdue UniversityGoogle Scholar
  32. 32.
    Baken RJ, Orlikoff RF (2000) Clinical measurement of speech and voice, 2nd edn. Singular Thomson Learning, San DiegoGoogle Scholar
  33. 33.
    Titze IR (1995) Workshop on acoustic voice analysis: summary statement. Technical report, National Center for Voice and Speech, IowaGoogle Scholar
  34. 34.
    Godino-Llorente JI, Gomez-Vilda P (2004) Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 51:380–384Google Scholar
  35. 35.
    Herzel H, Berry D, Titze IR, Saleh M (1994) Analysis of vocal disorders with methods from nonlinear dynamics. J Speech Hear Res 37:1008–1019Google Scholar
  36. 36.
    Zhang Y, Jiang JJ, Biazzo L, Jorgensen M (2005) Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis. J Voice 19:519–528Google Scholar
  37. 37.
    Zhang Y, McGilligan C, Zhou L, Vigand M, Jiang JJ (2004) Nonlinear dynamic analysis of voices before and after surgical excision of vocal polyps. J Acoust Soc Am 115:2270–2277 (2004)Google Scholar
  38. 38.
    Hansen JHL, Ceballos GC, Kaiser JF (1998) A nonlinear operator-based speech feature analysis method with application to vocal fold pathology assessment. IEEE Trans Biomed Eng 45(3):300–313CrossRefGoogle Scholar
  39. 39.
    Alonso J, de Leon J, Alonso I, Ferrer M (2001) Automatic detection of pathologies in the voice by hos based parameters. EURASIP J Appl Signal Process 4:275–284Google Scholar

Copyright information

© The Author(s) 2012

Authors and Affiliations

  1. 1.Department of InstrumentationSGGS Institute of Engineering and TechnologyVishnupuri, NandedIndia
  2. 2.Department of E & TC EngineeringSRES College of EngineeringKopargaonIndia

Personalised recommendations