Abstract
The chapter critically reviews several applications of fuzzy logic and fuzzy systems in speech technology, along the main directions of the filed: speech synthesis, speech recognition, and speech analysis. A brief incursion in the use of mixed techniques, combining fuzzy logic, fuzzy classifiers and nonlinear dynamics is included. A rich list of references complements the chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In fact, only frequency is required in this case, but we pursue the two-parameter feature vector for sake of a more complete example.
- 2.
A vowel-like sound is detected when there is a valid pitch, that is, when the vocal folds vibrate. Pure consonants are produced without a pitch.
- 3.
Including [m], [n], [l], [r] etc.
- 4.
This requires a procedure for determining the boundaries of the sound.
References
Amano, A., Aritsuka, T., Hataoka, N., Ichikawa, A.: On the use of neural networks and fuzzy logic in speech recognition. In: IJCNN, International Joint Conference on Neural Networks, 1989, pp. 301–305, vol. 1, 18–22 June 1989 Washington, DC, USA. doi:10.1109/IJCNN.1989.118595
Amano, A., Ichikawa, A., Hataoka, N. (inventors): US 5,040,215 (date of patent Aug. 13, 1991), Speech recognition apparatus using neural network and fuzzy logic, applied by Hitachi, Ltd
Avci, E., Akpolat, Z.H.: Speech recognition using a wavelet packet adaptive network based fuzzy inference system. Expert Syst. Appl. 31, 495–503 (2006)
Benesty, J., Sondhi, M.M., Huang, Y. (eds.): Springer Handbook of Speech Processing, 1176Â pp. Springer Science & Business Media, Berlin (2008)
Beritelli, F., Casale, S., Cavallaro, A.: A robust voice activity detector for wireless communications using soft computing. IEEE J. Sel. Areas Commun. 16(9), 1818–1829 (1998)
Beritelli, F., Casale, S., Cavallaro, A.: A multi-channel speech/silence detector based on time delay estimation and fuzzy classification. In: Proceedings of the 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, 15–19 March 1999, vol. 1, pp. 93–96 Phoenix, AZ. doi:10.1109/ICASSP.1999.758070
Beritelli, F., Casale, S., Ruggeri, G., Serrano, S.: Performance evaluation and comparison of g.729/amr/fuzzy voice activity detectors. IEEE Signal Process. Lett. 9(3), 85–88 (2002)
Brito, J.A.: A fuzzy-genetic approach for the computational modeling of speech articulatory processes. Saber (Venezuela) 21(3), 269–276 (2009)
Brito, J., Rodriguez, W.: Multipopulation genetic learning of midsagittal articulatory models for speech synthesis. GrC 2006, pp. 166–169
Burileanu, C., Popescu, V., Buzo, A., Petrea, C.S., Ghelmez-Haneş, D.: Spontaneous speech recognition for Romanian in spoken dialogue systems. Proc. Romanian Acad. Ser. A 11(1), 83–91 (2010)
Cheng, R.G., Chang, C.J.: Design of a fuzzy traffic controller for ATM networks. IEEE/ACM Trans. Netw. 4(3), 460–469 (1996)
Cavallaro, A., Beritelli, F., Casale, S.: A fuzzy logic-based speech detection algorithm for communications in noisy environments. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, vol. 1, pp. 565–568, 12–15 May 1998, Seattle, WA. doi:10.1109/ICASSP.1998.674493
Chatterjee, A., Pulasinghe, K., Watanabe, K., Izumi, K.: A particle-swarm-optimized fuzzy-neural network for voice-controlled robot systems. IEEE Trans. Ind. Electron. 52(6), 1478–1489 (2005)
Cheok, A.D., Chevalier, S., Kaynak, M., Sengupta, K.: Use of a novel generalized fuzzy hidden Markov model for speech recognition. In: The 10th IEEE International Conference on Fuzzy Systems, 2001, vol. 3, pp. 1207–1210, 02–05 Dec 2001 Melbourne, Victoria. doi:10.1109/FUZZ.2001.1008874
Chibelushi, C.C., Mason, J.S., Deravi, R.: Integration of acoustic and visual speech for speaker recognition. In: EUROSPEECH ‘93 Third European Conference on Speech Communication and Technology, Berlin, Germany, 22–25 Sept 1993, pp. 157–160 (1993). http://www.isca-speech.org/archive/eurospeech_1993/e93_0157.html
Ciota, Z.: Improvement of speech processing using fuzzy logic approach. In: Joint 9th IFSA World Congress and 20th NAFIPS International Conference, 2001, vol. 2, 25–28 July 2001, pp. 727–731, vol. 2, 25–28 July 2001, Vancouver, BC
Coorman, G et al.: US Patent no. 7219060, Speech synthesis using concatenation of speech waveforms. 15 May 2007
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction. IEEE Signal Process. Mag. 18(1), 32–80 (2001)
Cucu, H., Buzo, A., Besacier, L., Burileanu, C.: SMT-based ASR domain adaptation methods for under-resourced languages: application to Romanian. Speech Commun. 56, 195–212 (2014)
Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publications, Dordrecht (1997)
de Gelder, B., Vroomen, J.: The perception of emotions by ear and by eye. Cogn. Emot. 14(3), 289–311 (2000)
De Mori, R.: Use of fuzzy algorithms for phonetic and phonemic labeling of continuous speech. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-2(2), 136–148 (1980)
Erickson, D.: Expressive speech: Production, perception and application to speech synthesis. Tutorial. Acoust. Sci. Technol 26(4), 317–325 (2005)
Fant, G., Liljencrants, J., Lin, Q.: A four parameter model of glottal flow (Research Report STL-QPSR 4, KTH), pp. 1–13. Stockholm, Royal Institute of Technology (1985)
Fant, G., Lin, Q.: Frequency domain interpretation and derivation of glottal glow parameter (Research Report STL-QPSR 2-3, KTH), pp. 1–21. Stockholm, Royal Institute of Technology (1988)
Feraru, S.M., Teodorescu, H.N., Zbancioc, M.D.: SRoL—Web-based resources for languages and language technology e-learning. Int. J. Comput. Commun. Control 5(3), 301–313 (2010)
Gierur, J.A., Pisoni, D.B.: Speech perception, Chapter 9 in Handbook of Speech-Language Pathology and Audiology, pp. 253–276. Decker Inc., New York (1988)
Gharavian, D., Sheikhan, M., Nazerieh, A., Garoucy, S.: Speech emotion recognition using FCBF feature selection method and GA-optimized fuzzy ARTMAP neural network. Neural Comput. Appl. 21(8), 2115–2126 (2012)
Grigoras, F., Teodorescu, H.N., Jain, L.C., Apopei, V., (1999). Fuzzy and knowledge-based control for speech synthesis. In: ECC’99 CD-ROM Proceedings. Karlsruhe, Germany, VDI/VDE Geselschaft (1999)
Grigoras, F., Apopei, V., Jitca, D., Teodorescu, H.N.: Conclusions from a research on soft-computing rule-based speech synthesis for Romanian language. In: ECIT2000 CD-ROM Proceedings. Iasi, Romania, Coda Press. (2000)
Grimm, M., Kroschel, K., Narayanan, S.: Support vector regression for automatic recognition of spontaneous emotions in speech. In: ICASSP 2007, IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, vol. 4, 15–20 April 2007, pp. IV-1085–1088 ISSN 1520-6149, Honolulu, HI. doi:10.1109/ICASSP.2007.367262
Grimm, M., Kroschel, K., Mower, E., Narayanan, S., Primitives-based evaluation and estimation of emotions in speech, Speech Commun. 49(10–11), 787–800 (2007). doi:10.1016/j.specom.2007.01.010
Halavati, R., Shouraki, S.B., Zadeh, S.H.: Recognition of human speech phonemes using a novel fuzzy approach. Appl. Soft Comput. 7(3), 828–839 (2007)
Ioannou, S.V., et al.: Emotion recognition through facial expression analysis based on a neurofuzzy network. Neural Netw. 18(4), 423–435 (2005)
Iles, J., Ing-Simmons, N.: Rsynth 2.0, Text-to-Speech software. ftp://svr-ftp.eng.cam.ac.uk comp.speech/sources (1994)
ITU-T G.729.1, G.729-based embedded variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable with G.729. Telecommunication Standardization Sector of ITU, Series G: Transmission Systems and Media, Digital Systems and Networks (05/2006)
Jitca, D., Teodorescu, H.N., Apopei, V., Grigoras, F.: Improved speech synthesis using fuzzy methods. Int. J. Speech Technol. 5(3), 227–235 (2002) (Kluwer Academic Publishers)
Juang, C.-F., Cheng, C.-N., Chen, T.M.: Speech detection in noisy environments by wavelet energy-based recurrent neural fuzzy network. Expert Syst. Appl. 36(1), 321–332 (2009)
Kandel, A., Teodorescu, H.-N., Arotaritei, D.: Analytic fuzzy RBF neural network. In: 1998 Conference of the North American Fuzzy Information Processing Society-NAFIPS, pp. 281–285, IEEE
Kasabov, N., Iliev, G., A methodology and a system for adaptive recognition in a noisy environment based on adaptive noise cancellation and evolving fuzzy neural networks. Preliminary Patent, University of Otago, 21 Dec 1999, New Zealand
Kasabov, N., Iliev, G.: Hybrid system for robust recognition of noisy speech based on evolving fuzzy neural networks and adaptive filtering. In: Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks, 2000 (IJCNN 2000), vol. 5, 2000, pp. 91–96, vol. 5, 24–27 July 2000, Como, Italy. doi:10.1109/IJCNN.2000.861440
Klatt, D.: Software for cascade/parallel formant synthesizer. J. Acoust. Soc. Am. 67, 971–995 (1980)
Klatt, D.: Review of text-to-speech conversion for English. J. Acoust. Soc. Am. 82, 737–793 (1987)
Köhler, B.-U., Hennig, C., Orglmeister, R.: QRS detection using zero crossing counts. Prog. Biomed. Res. 8(3), 138–145 (2003)
Koo, J.M., Un, C.K.: Fuzzy smoothing of HMM parameters in speech recognition. Electron. Lett. 26(11), 743–744 (1990). doi:10.1049/el:19900485
Kosanovic, B.R., Chaparro, L.F., Sclabassi, R.J.: Signal analysis in fuzzy information space. Fuzzy Sets Syst. 77, 49–62 (1996)
Kosanovic, B.R., Chaparro, L.F., Sclabassi, R.J.: Modeling of quasi-stationary signals using temporal fuzzy sets and time-frequency distributions. In: Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis, Philadelphia, PA, pp. 425–428, IEEE Press, New York (1994)
Kosanovic, B.R., Chaparro, L.F., Sclabassi, R.J.: Hidden process modeling. In: Proceedings ICASSP-95, vol. 5, pp. 2935–2938, Detroit, MI, 8–12 May 1995, IEEE
Kosanovic, B.R., Chaparro, L.F., Sun, M., Sclabassi, R.J.: Physical system modeling using temporal fuzzy sets. In: Proceedings of the International Joint Conference of NAFIPS/IFIS/NASA ‘94, San Antonio, TX, pp. 429–433. IEEE Press, New York (1994)
Lee, C.M., Narayanan, S.S.: Toward detecting emotions in spoken dialogs. IEEE Trans. Speech Audio Process. 13(2), 293–303 (2005)
Lee, C.M., Narayanan, S.: Emotion recognition using a data-driven fuzzy inference system. In: The Proceedings of EUROSPEECH, Geneva, 2003, pp. 157–160 (Proceeding European Conf. Speech Communication and Technology, 2003, EUROSPEECH 2003—INTERSPEECH 2003, 8th European Conference on Speech Communication and Technology), Geneva, Switzerland, 1–4 Sept 2003
Lin, C.-T., Wu, R.-C., Chang, J.-Y., Liang, S.-F.: A novel prosodic-information synthesizer based on recurrent fuzzy neural network for the chinese TTS system. IEEE Trans. Syst. Man Cybern.—Part B Cybern. 34(1), 309–324 (2004)
Massaro, D.W., Cohen, M.M.: Fuzzy logical model of bimodal emotion perception: Comment on “The perception of emotions by ear and by eye’’ by de Gelder and Vroomen. Cogn. Emot. 14(3), 313–320 (2000)
Melin, P., Urias, J., Solano, D., Soto, M., Lopez, M., Castillo, O.: Voice recognition with neural networks, type-2 fuzzy logic and genetic algorithms. Eng. Lett. 13, 2 EL_13_2_9 (Advance online publication: 4 Aug 2006)
Melin, P., Castillo, O.: Voice recognition with neural networks, fuzzy logic and genetic algorithms. In: Hybrid Intelligent Systems for Pattern Recognition Using Soft Computing. Studies in Fuzziness and Soft Computing, vol. 172, pp. 223–240, 26 Feb 2005
Mills, P., Bowles, J.: Fuzzy logic enhanced symmetric dynamic programming for speech recognition. In: Proc. Fifth IEEE International Conference on Fuzzy Systems, 1996, vol. 3, 8–11 Sept 1996, pp. 2013–2019 vol. 3, 08–11 Sept 1996, New Orleans, LA. doi:10.1109/FUZZY.1996.552747
Ndousse, T.D.: Fuzzy neural control of voice cells in ATM networks. IEEE J. Sel. Areas Commun. 12(9), 1488–1494 (1994)
Naphade, M., Smith, J.R., Tesic, J., Shih-Fu Chang, Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J.: Large-scale concept ontology for multimedia. IEEE MultiMedia 13(3), 86–91 (2006)
Ndousse, T.D.: Fuzzy expert systems in a TM networks, pp. 229–284. In: Jain, L.C., Martin, N.M. (eds.) Fusion of Neural Networks, Fuzzy Systems and Genetic Algorithms: Industrial Applications. CRC Press, Boca Raton, USA (1998)
Oden, G.C., Massaro, D.W.: Integration of featural information in speech perception. Psychol. Rev. 85(3), 172–191 (1978)
Pal, S.K., Majumder, D.D.: Fuzzy sets and decisionmaking approaches in vowel and speaker recognition. IEEE Trans. Syst. Man Cybern. 7, 625–629 (1977)
Pal, S.K., Mitra, S.: Multilayer perceptron, fuzzy sets, and classification. IEEE Trans. Neural Netw. 3(5), 683–697 (1992)
Petrantonakis, P.C., Hadjileontiadis, L.J.: Emotion recognition from EEG using higher order crossings. IEEE Trans. Inf. Technol. Biomed. 14(2), 186–197 (2010)
Ramirez, J., Segura, J.C., Benitez, C., de la Torre, A., Rubio, A.: Efficient voice activity detection algorithms using long-term speech information. Speech Commun. 42, 271–287 (2004)
Raptis, S., Carayannis, G.: Fuzzy logic for rule-based formant speech synthesis. In: EUROSPEECH‘97 Proceedings, Rhodes, Greece, vol. 3, pp. 1599–1602 (1997)
Rodriguez, W., Teodorescu, H.N., Grigoras, F., Kandel, A., Bunke, H.: A fuzzy information space approach to speech signal non-linear analysis. Int. J. Intell. Syst. 15(4), 343–363 (2000)
Rodriguez, W., Kandel, A., Bunke, H.: 3D-curve similarity using fuzzy string matching. In: Proceedings of the Sixth IEEE Int. Conference on Fuzzy Systems, 1997, vol. 1, pp. 79–82, 1–5 July 1997
Rodriguez, W., Last, M., Kandel, A., Bunke, H.: 3-Dimensional curve similarity using string matching. Robot. Auton. Syst. 49(3–4, 31), 165–172 (2004)
Scherer, S., Kane, J., Gobl, C., Schwenker, F.: Investigating fuzzy-input fuzzy-output support vector machines for robust voice quality classification. Comput. Speech Lang. 27(1), 263–287 (2013) (Special issue on Paralinguistics in Naturalistic Speech and Language)
Shikano, K., Nakamura, S., Abe, M.: Speaker adaptation and voice conversion by codebook mapping, In: IEEE International Symposium on Circuits and Systems, 1991, vol. 1, pp. 594–597, 11–14 June 1991
Stylios, C.D., Georgopoulos, V.C., Malandraki, G.A., Chouliara, S.: Fuzzy cognitive map architectures for medical decision support systems. Appl. Soft Comput. 8, 1243–1251 (2008)
Su, M.-C., Hsieh, C.-T., Chin, C.C.: A neuro-fuzzy approach to speech recognition without time alignment. Fuzzy Sets Syst. 98(1), 33–41 (1998)
Szekely, E., Kane, J., Scherer, S., Gobl, C., Carson-Berndsen, J.: Detecting a targeted voice style in an audiobook using voice quality features. ICASSP 2012, 4593–4596 (2012)
Tanaka, K., Masanobu, A.: United States Patent 6,081,781, Method and apparatus for speech synthesis and program recorded medium. 27 June 2000
Temko, A., Macho, D., Nadeu, C.: Fuzzy integral based information fusion for classification of highly confusable non-speech sounds. Pattern Recognit. 41, 814–1823 (2008)
Teodorescu, H.N., Chelaru, M., Sofron, E., Adascalitei, A.: Adaptive speech synthesis. ITG-Fachbericht 105, Digitale Sprach-verarbeitung—Prinzipien und Anwendungen. VDE-Verlag GmBh, Berlin, pp. 183–188 (1988)
Teodorescu, H.N., Yamakawa, T.: Applications of chaotic systems: an emerging field. Int. J. Intell. Syst. 12(4), 251–253 (1997)
Teodorescu, H.N., Kandel, A., Schneider, M.: Fuzzy modeling and dynamics. Fuzzy Sets Syst. 106(1), 1–2 (1999)
Teodorescu, H.N.L., Kandel, A., Hall, L.O.: Report of research activities in fuzzy AI and medicine at USFCSE. Artif. Intell. Med. 21(1–3), 177–183 (2001)
Teodorescu, H.-N.L.: Interrelationships, communication, semiotics, and artificial consciousness. In: Kitamura, T. (ed.) What Should Be Computed to Understand and Model Brain Function? FLSI Book Series, vol. 3, pp. 115–147. World Scientific, Singapore. ISBN 981-02-4518-1 (2001)
Teodorescu, H.N., Stoica, A, Mlynek, D., et al.: Nonlinear dynamics sensitivity analysis in networks and applications to sensing. In: Filip, F.G., Dumitrache, I., Iliescu, S. (eds.) Large Scale Systems: Theory and Applications 2001 (LSS’01), IFAC Symposia Series, pp. 333–338, 2002. Proceedings of the 9th IFAC Symposium on Large Scale Systems, Bucharest, Romania, 18–20 July 2001
Teodorescu, H.N.: A proposed theory in prosody generation and perception: the multi-dimensional contextual integration principle of prosody. In: Burileanu, C. (ed.) Proc Sped 2005, pp. 109–118. Romanian Academy Publ, Bucharest (2005)
Teodorescu, H.-N., Feraru, S.M.: A study on speech with manifest emotions. In: Matousek, V; Mautner, P. (eds.) Text, Speech and Dialogue, Proceedings. Lecture Notes In Artificial Intelligence, vol. 4629, pp. 254–261 (2007)
Teodorescu, H.-N., Feraru, M.: Analyzing emotions in spoken Romanian. Proc. Romanian Acad. Ser. A-Math. Phys. Tech. Sci. Inf. Sci. 8(2), 161–168 (2007)
Teodorescu, H.N., Feraru, M.: Classification in Gnathophonics—Preliminary Results. In: Proceedings of the Second International Symposium on Electrical and Electronics Engineering—ISEEE-2008, Galati, Romania, 6 pp (2008)
Teodorescu, H.-N., Feraru, M., Zbancioc, M.: Assessing the quality of voice synthesizers. In: Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue, 2009 SpeD’09, pp. 1–10 (2009)
Teodorescu, H.-N.: Characterization of nonlinear dynamic systems for engineering purposes—a partial review. Int. J. Gen. Syst. 41(8), 805–825 (2012)
Tian, Y., Wu, J., Wang, Z., Lu, D.: Fuzzy clustering and Bayesian information criterion based threshold estimation for robust voice activity detection. In: 2003 IEEE International Conference on, Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP ‘03), vol. 1, 6–10, pp. I-444–447 (2003)
Toledano, D.T., RodrÃguez Crespo, M.A., Escalada Sardina J.G.: Trying to mimic human segmentation of speech using HMM and fuzzy logic post-correction rules. In: The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis. Blue Mountains, NSW Australia, 26–29 Nov 1998 SSW3-1998, pp. 207–212. http://www.isca-speech.org/archive_open/archive_papers/ssw3/ssw3_207.pdf
Van Segbroeck, M., Van Hamme, H.: Robust speech recognition using missing data techniques in the prospect domain and fuzzy masks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, 4 April 2008, pp. 4393–4396, Las Vegas, NV. doi:10.1109/ICASSP.2008.4518629
Wahyudi, W.A., Syazilawati, M.: Intelligent voice-based door access control system using adaptive-network-based fuzzy inference systems (ANFIS) for building security. J. Comput. Sci. 3(5), 274–280 (2007)
Wang, X., Lou, X., Li, J., Li, J. (Inventors): Method, apparatus for synthesizing speech and acoustic model training method for speech synthesis. Patent application number 20120221339, 2012-08-30
Zbancioc, M., Feraru, M.: The analysis of the FCM and WKNN algorithms performance for the emotional Corpus SROL, Adv. Electr. Comput. Eng. 12(3), 33–38 (2012). doi:10.4316/AECE.2012.03005, http://www.aece.ro/abstractplus.php?year=2012&number=3&article=5
Zbancioc, M., Feraru, M.: Emotion recognition of the SROL Romanian database using fuzzy KNN algorithm. In: International Symposium on Electronics and Telecommunications IEEE—ISETC 2012 Tenth Edition, Timisoara, pp. 347–350, 15–16 Nov 2012. ISBN: 978-1-4673-1177-9, http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6408133 (2012)
Zhao, H., Wang, G, Xu, C., Yu, F.: Voice activity detection method based on multivalued coarse-graining Lempel-Ziv complexity. In: Comput. Sci. Inf. Syst. 8(3), 869–888 (2011) (US Patent 7,999,857, Aug. 16, 2011, F.E. Bonn, R.D. Adair, R.N. Peterson, D.D. Adair, Voice, Lip-Reading, Face and Emotion Stress Analysis, Fuzzy Logic Intelligent Camera System)
Acknowledgments
The first author acknowledges the help of Prof. Corneliu Burileanu, Dr. Marius Zbancioc and Dr. Monica Feraru in reviewing a preliminary version of the chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Teodorescu, HN. (2015). Fuzzy Logic in Speech Technology - Introductory and Overviewing Glimpses. In: Tamir, D., Rishe, N., Kandel, A. (eds) Fifty Years of Fuzzy Logic and its Applications. Studies in Fuzziness and Soft Computing, vol 326. Springer, Cham. https://doi.org/10.1007/978-3-319-19683-1_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-19683-1_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19682-4
Online ISBN: 978-3-319-19683-1
eBook Packages: EngineeringEngineering (R0)