Abstract
In this chapter, an approach for improving the recognition performance of CV units under clean, coded, and noisy conditions is presented. Proposed CV recognition method is carried out in two stages. In the first stage vowel category of CV unit is recognized, and in the second stage consonant category is recognized. At each stage of the proposed method, complementary evidences from support vector machine (SVM) and hidden Markov models (HMM) are combined for enhancing the recognition performance of CV units. In the proposed CV recognition approach, VOP is used as an anchor point for extracting features from the CV unit. Therefore, VOP detection methods presented in previous chapter are used for this work. Performance of the proposed CV recognition method is demonstrated under coding and noisy conditions. Recognition studies are carried out using isolated CV and CV units from Telugu broadcast news databases. Further, performance of the CV recognition system under background noise is improved by using combined temporal and spectral processing-based preprocessing methods.
Keywords
- Consonant Recognition
- Spectral Processing Methods
- Noisy Speech
- final Weight Function
- Specific Speech Features
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
K.N. Stevens, Acoustic Phonetics (MIT Press, Cambridge, MA, 1999)
D. Crystal, A Dictionary of Linguistics and Phonetics (Basil Blackwell, Cambridge, Massachusetts, 1985)
M.A. Jack, J. Laver, Aspects of Speech Technology (Edinburgh university press, Edinburgh, 1988)
S.R.M. Prasanna, Event-based analysis of speech, PhD thesis, IIT Madras, March 2004
S.R.M. Prasanna, S.V. Gangashetty, B. Yegnanarayana, Significance of vowel onset point for speech analysis, in Proc. of Int. Conf. Signal Processing and Communications, (Bangalore, India, 2001), pp. 81–88
K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)
D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87, 866–873 (1990)
J.-H. Wang, S.-H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Phoenix, Arizona, 1999), pp. 1261–1264
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, (Jeju Island, Korea, 2004), pp. 401–410
J.-F. Wang, C.H. Wu, S.H. Chang, J.Y. Lee, A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana., Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances, in Proc. of IEEE ICISIP, pp. 159–164, 2004
S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information, in Proc. of Interspeech (Lisbon, Portugal, 2005), pp. 1133–1136
A. Kazemzadeh, J. Tepperman, J. Silva, H. You, S. Lee, A. Alwan, S. Narayanan, Automatic detection of voice onset time contrasts for use in pronunciation assessment, in Proc. Int. Conf. Spoken Language Processing (Pittsburgh, PA, USA, 2006)
V. Stouten, H.V. hamme, Automatic voice onset time estimation from reassignment spectra. Speech Comm. 51, 1194–1205 (2009)
S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)
K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Comm. 51, 1263–1269 (2009)
K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. (Elsevier) 55(6), 745–756 (2013)
J.H.L. Hansen, S.S. Gray, W. Kim, Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Comm. 52, 777–789 (2010)
C. Prakash, N. Dhananjaya, S. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. of Int. Conf. on the Systems, Signals and Image Processing (IWSSIP), (IEEE, Sarajevo, Bosnia and Herzegovina, 2011), pp. 1–4
D. Zaykovskiy, Survey of the speech recognition techniques for mobile devices, in Proc. of DS Publications, 2006
Z.H. Tan, B. Lindberg, Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, London, 2008)
J.M. Huerta, Speech recognition in mobile environments, PhD thesis, Carnegie Mellon University, Apr. 2000
A.M. Peinado, J.C. Segura, Speech Recognition over Digital Channels (Wiley, New York, 2006)
S. Kafley, A.K. Vuppala, A. Chauhan, K.S. Rao, “Continuous digit recognition in mobile environment,” in Proc. of IEEE Techsym (IIT Kharagpur, India, 2010), pp. 217–222
A.M. Gomez, A.M. Peinado, V. Sanchez, A.J. Rubio, Recognition of coded speech transmitted over wireless channels. IEEE Trans. Wireless Comm. 5, 2555–2562 (2006)
S. Euler, J. Zinke, The influence of speech coding algorithms on automatic speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994), pp. 621–624
B.T. Lilly, K.K. Paliwal, Effect of speech coders on speech recognition performance, in Proc. Int. Conf. Spoken Language Processing (Philadelphia, PA, USA, 1996), pp. 2344–2347
A. Gallardo-Antolin, C. Pelaez-Moreno, F.D. de Maria, Recognizing GSM digital speech. IEEE Trans. Speech Audio Process 13(6), 1186–1205 (2005)
F. Quatieri, E. Singer, R.B. Dunn, D.A. Reynolds, J.P. Campbell, Speaker and language recognition using speech codec parameters, in Proc. of Eurospeech (Budapest, Hungary, 1999), pp. 787–790
R.B. Dunn, T.F. Quatieri, D.A. Reynolds, J.P. Campbell, Speaker recognition from coded speech in matched and mismatched condition, in Proc. of Speaker Recognition Workshop (Crete, Greece, 1999), pp. 115–120
R. Dunn, T. Quatieri, D. Reynolds, J. Campbell, Speaker recognition from coded speech and the effects of score normalization, in Proc. of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (IEEE, Monterery, CA, USA, 2001), pp. 1562–1567
A. Krobba, M. Debyeche, A. Amrouche, Evaluation of speaker identification system using GSM-EFR speech data, in Proc. of Int. Conf. on Design and Technology of Integrated Systems (Nanoscale Era Hammamet, 2010), pp. 1–5
A. Janicki, T. Staroszczyk, Speaker recognition from coded speech using support vector machines, in Proc. of 4th Int. Conf. on Text, Speech and Dialogue (Springer, Pilsen, Czech Republic, 2011), pp. 291–298
C. Mokbel, G. Chollet, Speech recognition in adverse environments: speech enhancement and spectral transformations, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Ontario, Canada, 1991)
J.A. Nolazco-Flores, S. Young, CSS-PMC: a combined enhancement/compensation scheme for continuous speech recognition in noise. Cambridge University Engineering Department. Technical Report, 1993
J. Huang, Y. Zhao, Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Process. Lett. 4, 283–285 (1997)
K. Hermus, W. Verhelst, P. Wambacq, Optimized subspace weighting for robust speech recognition in additive noise environments, in Proc. of ICSLP (Beijing, China, 2000), pp. 542–545
K. Hermus, P. Wambacq, Assessment of signal subspace based speech enhancement for noise robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Montreal, Canada, 2004), pp. 945–948
H. Kris, W. Patrick, V.H. Hugo, A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Appl. Signal Process. 195–209 (2007)
H. Hermanski, N. Morgan, H.G. Hirsch, Recognition of speech in additive and convolutional noise based on RASTA spectral processing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994)
O. Viiki, B. Bye, K. Laurila, A recursive feature vector normalization approach for robust speech recognition in noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Seattle, USA, 1998)
D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, A. Acero, A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Las Vegas, USA, 2008), pp. 4041–4044
X. Cui, A. Alwan, Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR. IEEE Trans. Speech Audio Process. 13, 1161–1172 (2005)
F. Hilger, H. Ney, Quantile based histogram equalization for noise robust large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 845–854 (2006)
A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Perez-Cordoba, M.C. Benitez, A.J. Rubio, Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Speech Audio Process. 13(3), 355–366 (2005)
Y. Suh, M. Ji, H. Kim, Probabilistic class histogram equalization for robust speech recognition. IEEE Signal Process. Lett. 14(4), 287–290 (2007)
K. Ohkura, M. Sugiyama, Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Canada, 1991)
M. Gales, S.Young, S.J. Young, Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)
P.J. Moreno, Speech Recognition in Noisy Environments, PhD thesis, Carnegie Mellon University, 1996
S.V. Vaseghi, B.P. Milner, Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 5, 11–21 (1997)
H. Liao, M.J.F. Gales, Adaptive training with joint uncertainty decoding for robust recognition of noisy data, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Honolulu, USA, 2007), pp. 389–392
O. Kalinli, M.L. Seltzer, J. Droppo, A. Acero, Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 18(8), 1889–1901 (2010)
D.K. Kim, M.J.F. Gales, Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(2), 315–325 (2011)
S.V. Gangashetty, Neural network models for recognition of consonant-vowel units of speech in Multiple Languages, PhD thesis, IIT Madras, October 2004
C.C. Sekhar, Neural Network models for recognition of stop consonant-vowel (SCV) segments in continuous speech, PhD thesis, IIT Madras, 1996
K.S. Rao, Application of prosody models for developing speech systems in indian languages. Int. J. Speech Tech. (Springer) 14, 19–33 (2011)
C.C. Sekhar, W.F. Lee, K. Takeda, F. Itakura, Acoustic modeling of subword units using support vector machines, in Proc. of WSLP (Mumbai, India, 2003)
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages, in Proc. of ICISIP (Chennai, India, 2005), pp. 387–391
K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)
E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Comm. 16, 175–205 (1995)
M.R. Portnoff, Time-scale modification of speech based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. 29, 374–390 (1981)
H.G. Ilk, S. Guler, Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications. Signal Process. 86, 127–139 (2006)
K.S. Rao, Real time prosody modification. J. Signal Inform. Process. 50–62 (2010)
T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Signal Process. 40, 497–510 (1992)
J. di Marino, Y. Laprie, Supression of phasiness for time-scale modifications of speech signals based on a shape invarience property, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Saltlake city, Utah, USA, 2001)
E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Comm. 9, 453–467 (1990)
M. Slaney, M. Covell, B. Lassiter, Automatic audio morphing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Atlanta, GA, USA, 1996)
O. Donnellan, E. Jung, E. Coyle, Speech-adaptive time-scale modification for computer assisted language-learning, in Proc. of 3rd IEEE Int. Conf. on Advanced Learning Technologies (ICALT03) (Aix-en-Provence, France, 2003)
A. Klapuri, Sound onset detection by applying psychoacoustic knowledge, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Washington, DC, USA, 1999), pp. 3089–3092
C. Duxbury, M.E. Davies, M.B. Sandler, Separation of transient information in musical audio using multiresolution analysis techniques, in Proc. of Int. Conf. Digital Audio Effects (DAFX) Limerick (Limerick, 2001), pp. 1–4
J. Bonada, Automatic technique in frequency domain for near-lossless time-scale modification of audio, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Berlin, Germany, 2000), pp. 396–399
C. Duxbury, M.E. Davies, M. Sandler, Improved time-scaling of musical audio using phase locking at transients, in Proc. of Audio Engineering Society Convention 11 (Munich, Germany, 2002), paper 5530
A. Roebel, A new approach to transient processing in the phase vocoder, in Proc. of Int. Conf. Digital Audio Effects (DAFX) (London, 2003), pp. 344–349
X. Rodet, F. Jaillet, Detection and modeling of fast attack transients, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Havana, Cuba, 2001), pp. 30–33
S. Hainsworth, M. Macleod, P. Wolfe, Analysis of reassigned spectrograms for musical transcription, in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, 2001), pp. 23–26
S. Grofit, Y. Lavner, Time-scale modification of audio signals using enhanced WSOLA with management of transients. IEEE Trans. Audio Speech Lang. Process. 16, 106–115 (2008)
J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus linguistic data consortium, in Proc. of IEEE ICISIP (Philadelphia, PA, 1993)
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Spotting multilingual consonant-vowel units of speech using neural networks, in An ISCA Tutorial and Research Workshop on Non-linear Speech Processing, pp. 287–297, 2005
R.M. Hegde, H.A. Murthy, V. Gadde, Continuous speech recognition using joint features derived from the modified group delay function and MFCC, in Proc. of INTERSPEECH-Int. Conf. Spoken Language Processing (Jeju Island, Korea, 2004), pp. 905–908
K.S. Rao, B. Yegnanarayana, Intonation modeling for Indian languages. Comput. Speech Lang. 23, 240–256 (2009)
K.S. Rao, B. Yegnanarayana, Modeling durations of syllables using neural networks. Comput. Speech Lang. (Elsevier) 21, 282–295 (2007)
K.S. Rao, S.G. Koolagudi, Selection of suitable features for modeling the durations of syllables. J. Softw. Eng. Appl. 1107–1117 (2010)
K.S. Rao, Role of neural network models for developing speech systems. SADHANA (Springer) 36, 783–836 (2011)
L. Mary, K.S. Rao, B. Yegnanarayana, Neural Network Classifiers for Language Identification using Syntactic and Prosodic features, in Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing (Chennai, India, 2005), pp. 404–408
L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Comm. 50, 782–796 (2008)
K.S. Rao, Acquisition and incorporation of prosody knowledge for speech systems in indian languages, PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, May 2005
A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)
S.R.M. Kodukula, Significance of excitation source information for speech analysis. PhD thesis, IIT Madras, March 2009
S. Guruprasad, Exploring features and scoring methods for speaker recognition, Master’s thesis, MS Thesis, IIT Madras, 2004
P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)
K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)
K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)
A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of speech coding on epoch extraction, in Proc. of IEEE Int. Conf. on Devices and Communications, (Mesra, India, 2011)
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int. J. Speech Tech. (Springer) 16(2), 229–235 (2013)
M.A. Joseph, S. Guruprasad, B. Yegnanarayana, Extracting formants from short segments of speech using group delay functions, in Proc. of Interspeech (Pittsburgh, PA, USA, 2006), pp. 1009–1012
M.A. Joseph, Extracting formant frequencies from short segments of speech, Master’s thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Apr. 2008
Noisex-92: http://spib.rice.edu/spib/select_noise.html
A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of noise on vowel onset point detection, in Proc. of Int. Conf. Contemporary Computing (Noida, India, 2011), pp. 201–211. Communications in Computer and Information Science (Springer)
A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on recognition of consonant-vowel (CV) units, in Proc. of Int. Conf. contemporary computing (Springer Communications in Computer and Information Science ISSN: 1865–0929), (Noida, India, 2010), pp. 284–294
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved consonant-vowel recognition for low bit-rate coded speech. Wiley Int. J. Adapt. Contr. Signal Process. 26, 333–349 (2012)
J.W. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81, 1215–1247 (1993)
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.0 (Cambridge University Press, Cambridge, 2000)
R. Collobert, S. Bengio, SVMTorch: support vector machines for large-scale regression problems. Proc. J. Mach. Learn. Res. 143–160 (2001)
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEUE (Elsevier) 66, 697–700 (2012)
P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Comm. 53, 154–174 (2011)
S. Bell, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)
S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Orlando, USA, 2002)
Y. Ephrain, D. Malah, Speech enhancement using minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32, 1109–1121 (1984)
B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Comm. 28, 25–42 (1999)
B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using lp residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)
B. Yegnanarayana, S.R.M. Prasanna, R. Duraiswami, D. Zotkin, Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio Process. 13, 1110–1118 (2005)
A.K. Vuppala, K.S. Rao, S. Chakrabarti, P. Krishnamoorthy, S.R.M. Prasanna, Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. Int. J. Speech Tech. (Springer) 14(3), 259–272 (2011)
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circ. Syst. Signal Process. (Springer) 31(4), 1459–1474 (2012)
A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imag. Syst. Eng. 6(3), 130–137 (2013)
A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Wiley Int. J. Adapt. Contr. Signal Process. 29, 781–792 (2013)
A.K. Vuppala, S. Chakrabarti, K.S. Rao, L. Dutta, “Robust speaker recognition on mobile devices,” in Proc. of IEEE Int. Conf. on Signal Processing and Communications (Bangalore, India, 2010)
K.S. Prahallad, B. Yegnanarayana, S.V. Gangashetty, Online text-independent speaker verification system using autoassociative neural network models, in Proc. of INNS-IEEE Int. Joint Conf. Neural Networks (Washington DC, USA, 2001), pp. 1548–1553
B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Network 15, 459–469 (2002)
A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on speaker identification, in Proc. of IEEE INDICON (Kolkata, India, 2010)
S. Sigurdsson, K.B. Petersen, T. Lehn-Schioler, Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music, in Proc. of Seventh Int. Conf. on Music Information Retrieval, 2006
A.L. Edwards, An Introduction to Linear Regression and Correlation (W.H. Freeman and Company Ltd, Cranbury, NJ, 08512, USA, 1976)
J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals (Macmilan Publishing, New York, 1993)
R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Publishing, New York, 1987)
S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, pp. 401–410, 2004
J.R. Deller, J.H. Hansen, J.G. Proakis, Discrete Time Processing of Speech Signals, 1st edn. (Prentice Hall PTR, Upper Saddle River, NJ, 1993)
J. Benesty, M.M. Sondhi, Y.A. Huang, Springer Handbook of Speech Processing (Springer, New York, 2008)
J. Volkmann, S. Stevens, E. Newman, A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8, 185–190 (1937)
Z. Fang, Z. Guoliang, S. Zhanjiang, Comparison of different implementations of MFCC. J. Comput. Sci. Tech. 16(6), 582–589 (2001)
G.K.T. Ganchev, N. Fakotakis, Comparative evaluation of various MFCC implementations on the speaker verification task, in Proc. of Int. Conf. on Speech and Computer (Patras, Greece, 2005), pp. 191–194
L.R. Rabiner, B.H. Juang, Fundamentals of speech Recognition (Prentice Hall PTR, Englewood cliffs, NJ, 1993)
S. Furui, Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust. Speech Signal Process. 29(3), 342–350 (1981)
J.S. Mason, X. Zhang, Velocity and acceleration features in speaker recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Toronto, Canada, 1991), pp. 3673–3676
W.C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders (Wiley, New York, 2003)
A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd edn. (Wiley, New York, 2004)
H.L.J. Hansen, B.L. Pellom, An effective quality evaluation protocol for speech enhancement algorithm, in Proc. Int. Conf. Spoken Language Processing, pp. 2819–2822, 1998
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in Proc. of IEEE, pp. 257–286, 1989
S. Theodoridis, K. Koutroumbas, Pattern Recognition, 3rd edn. (Elsevier, Academic press, Waltham, Massachusetts, USA, 2006)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Rao, K.S., Vuppala, A.K. (2014). Consonant–Vowel Recognition in the Presence of Coding and Background Noise. In: Speech Processing in Mobile Environments. SpringerBriefs in Electrical and Computer Engineering(). Springer, Cham. https://doi.org/10.1007/978-3-319-03116-3_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-03116-3_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03115-6
Online ISBN: 978-3-319-03116-3
eBook Packages: EngineeringEngineering (R0)