Intelligent Classification Systems

  • Andrey V. Savchenko
Part of the SpringerBriefs in Optimization book series (BRIEFSOPTI)


The design of intelligent classifications systems is a very broad research topic, that covers a large number of individual tasks, e.g., data preprocessing, feature extraction, segmentation, learning of classifier, etc. One of the most challenging problems is the recognition of the audiovisual data, such as speech signals, complex images, etc. A brief review of the known classification methods is given, and these methods are systematized in accordance to the number of available reference instances and the number of classes in the database. The comparison of the classifiers in terms of the multi-criteria optimization is studied. Namely, we take into account not only the classification accuracy, but also the runtime complexity of the algorithm. Moreover, the need for exploration of the system behavior in the presence of artificially generated noise is highlighted.


Support Vector Machine Speech Signal Gaussian Mixture Model Near Neighbor Automatic Speech Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [1]
    Abusev, R.A.: On group choice procedures for problems of classification and reliability in the case of lognormal variance. J. Math. Sci. 189(6), 911–918 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  2. [2]
    Aggarwal, C.: Data Mining: The Textbook. Springer International Publishing Switzerland, Cham (2015)CrossRefzbMATHGoogle Scholar
  3. [3]
    Arya, S., Mount, D.M.: Approximate nearest neighbor queries in fixed dimensions. In: Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 271–280. Society for Industrial and Applied Mathematics, Philadelphia (1993)Google Scholar
  4. [4]
    Benesty, J., Sondhi, M.M., Huang, Y.: Springer Handbook of Speech Processing. Springer, Berlin (2008)CrossRefGoogle Scholar
  5. [5]
    Bottou, L., Fogelman Soulie, F., Blanchet, P., Lienard, J.: Speaker-independent isolated digit recognition: Multilayer perceptrons vs. dynamic time warping. Neural Netw. 3(4), 453–465 (1990)Google Scholar
  6. [6]
    Chapelle, O., Schulkopf, B., Zien, A.: Semi-Supervised Learning. The MIT Press, Cambridge (2010)Google Scholar
  7. [7]
    Ciresan, D., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)CrossRefGoogle Scholar
  8. [8]
    Ciresan, D., Meier, U., Masci, J., Schmidhuber, J.: Multi-column deep neural network for traffic sign classification. Neural Netw. 32, 333–338 (2012)CrossRefGoogle Scholar
  9. [9]
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  10. [10]
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)Google Scholar
  11. [11]
    Erman, L.D., Hayes-Roth, F., Lesser, V.R., Reddy, D.R.: The hearsay-ii speech-understanding system: Integrating knowledge to resolve uncertainty. ACM Comput. Surv. 12(2), 213–253 (1980)CrossRefGoogle Scholar
  12. [12]
    Ghoshal, A., Swietojanski, P., Renals, S.: Multilingual training of deep neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7319–7323 (2013)Google Scholar
  13. [13]
    Gillick, L., Cox, S.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 532–535 (1989)Google Scholar
  14. [14]
    Gonzalez, E., Figueroa, K., Navarro, G.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008)CrossRefGoogle Scholar
  15. [15]
    Graves, A., Fernandez, S., Gomez, F.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning, pp. 369–376 (2006)Google Scholar
  16. [16]
    Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649 (2013)Google Scholar
  17. [17]
    Hammerstrom, D., Rehfuss, S.: Neurocomputing hardware: present and future. Artif. Intell. Rev. 7(5), 285–300 (1993)CrossRefGoogle Scholar
  18. [18]
    Hand, D.J.: Classifier technology and the illusion of progress. Stat. Sci. 21(1), 1–14 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  19. [19]
    Haykin, S.O.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, Harlow (2008)Google Scholar
  20. [20]
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  21. [21]
    Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
  22. [22]
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  23. [23]
    Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7304–7308 (2013)Google Scholar
  24. [24]
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, pp. 1097–1105. Nips Foundation ( (2012)
  25. [25]
    Kullback, S.: Information Theory and Statistics. Dover Publications Inc., Mineola, New York (1997)zbMATHGoogle Scholar
  26. [26]
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  27. [27]
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  28. [28]
    Liao, S., Zhu, X., Lei, Z., Zhang, L., Li, S.Z.: Learning multi-scale block local binary patterns for face recognition. In: Lee, S.W., Li, S.Z. (eds.) Proceedings of the International Conference on Advances in Biometrics (ICB), Seoul, Korea, vol. 4642, pp. 828–837. Springer-Verlag Berlin Heidelberg (2007)Google Scholar
  29. [29]
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
  30. [30]
    Melin, P., Castillo, O.: Hybrid intelligent systems for pattern recognition using soft computing: An evolutionary approach for neural networks and fuzzy systems. In: Studies in Fuzziness and Soft Computing, vol. 172. Springer-Verlag Berlin Heidelberg (2005)Google Scholar
  31. [31]
    Munoz, D., Bagnell, J., Hebert, M.: Stacked hierarchical labeling. In: Proceedings of the 11th European Conference on Computer Vision: Part VI, pp. 57–70 (2010)Google Scholar
  32. [32]
    Myers, C.S., Rabiner, L.R.: A comparative study of several dynamic time-warping algorithms for connected-word recognition. Bell Syst. Tech. J. 60(7), 1389–1409 (1981)CrossRefGoogle Scholar
  33. [33]
    Pedrycz, W.: Granular Computing: Analysis and Design of Intelligent Systems. CRC Press, Boca Raton (2013)CrossRefGoogle Scholar
  34. [34]
    Prandoni, P., Vetterli, M.: Approximation and compression of piecewise smooth functions. Phil. Trans. R. Soc. 357(1760), 2573–2591 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  35. [35]
    Prince, S.: Computer Vision: Models, Learning, and Inference. Cambridge University Press, New York, NY, USA (2012)CrossRefzbMATHGoogle Scholar
  36. [36]
    Qiao, Y., Shimomura, N., Minematsu, N.: Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3989–3992 (2008)Google Scholar
  37. [37]
    Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs (1993)zbMATHGoogle Scholar
  38. [38]
    Rutkowski, L.: Adaptive probabilistic neural networks for pattern classification in time-varying environment. IEEE Trans. Neural Netw. 15(4), 811–827 (2004)MathSciNetCrossRefGoogle Scholar
  39. [39]
    Rutkowski, L.: Computational Intelligence: Methods and Techniques. Springer-Verlag Berlin Heidelberg (2010)zbMATHGoogle Scholar
  40. [40]
    Savchenko, A.V.: Adaptive video image recognition system using a committee machine. Opt. Mem. Neural Netw. 21(4), 219–226 (2012)CrossRefGoogle Scholar
  41. [41]
    Savchenko, A.V.: Directed enumeration method in image recognition. Pattern Recogn. 45(8), 2952–2961 (2012)CrossRefGoogle Scholar
  42. [42]
    Savchenko, A.V.: Phonetic words decoding software in the problem of Russian speech recognition. Autom. Remote Control 74(7), 1225–1232 (2013)CrossRefGoogle Scholar
  43. [43]
    Savchenko, A.V.: Probabilistic neural network with homogeneity testing in recognition of discrete patterns set. Neural Netw. 46, 227–241 (2013)CrossRefzbMATHGoogle Scholar
  44. [44]
    Savchenko, A.V.: Nonlinear transformation of the distance function in the nearest neighbor image recognition. In: Zhang, Y.J., Tavares, J.M.R.S. (eds.) Proceedings of the International Conference on Computational Modeling of Objects Presented in Images (CompIMAGE), LNCS, vol. 8641, pp. 261–266. Springer International Publishing Switzerland (2014)Google Scholar
  45. [45]
    Savchenko, A.V., Belova, N.S.: Statistical testing of segment homogeneity in classification of piecewise-regular objects. Int. J. Appl. Math. Comput. Sci. 25(4), 915–925 (2015)CrossRefGoogle Scholar
  46. [46]
    Savchenko, A.V., Khokhlova, Y.I.: About neural-network algorithms application in viseme classification problem with face video in audiovisual speech recognition systems. Opt. Mem. Neural Netw. 23(1), 34–42 (2014)CrossRefGoogle Scholar
  47. [47]
    Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649 (2012)Google Scholar
  48. [48]
    Schuller, B., Batliner, A., Steidl, S., Seppi, D.: Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge. Speech Comm. 53(9–10), 1062–1087 (2011)CrossRefGoogle Scholar
  49. [49]
    Shan, C.: Face recognition and retrieval in video. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds.) Video Search and Mining. Studies in Computational Intelligence, vol. 287, pp. 235–260. Springer Verlag Berlin Heidelberg (2010)Google Scholar
  50. [50]
    Shapiro, L.G., Stockman, G.C.: Computer Vision. Prentice Hall, Upper Saddle River (2001)Google Scholar
  51. [51]
    Silpa-Anan, C., Hartley, R.: Optimised KD-trees for fast image descriptor matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)Google Scholar
  52. [52]
    Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)CrossRefGoogle Scholar
  53. [53]
    Tan, X., Chen, S., Zhou, Z.H., Zhang, F.: Face recognition from a single image per person: a survey. Pattern Recogn. 39(9), 1725–1745 (2006)CrossRefzbMATHGoogle Scholar
  54. [54]
    Teodorescu, H.N., Watada, J., Jain, L.C., Kacprzyk, J. (eds.): Intelligent Systems and Technologies. Studies in Computational Intelligence, vol. 217. Springer, Berlin (2009)Google Scholar
  55. [55]
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic, Burlington (2008)zbMATHGoogle Scholar
  56. [56]
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 511–518 (2001)Google Scholar
  57. [57]
    Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process. 37(3), 328–339 (1989)CrossRefGoogle Scholar
  58. [58]
    Xanthopoulos, P., Pardalos, P., Trafalis, T.B.: Robust data mining. SpringerBriefs in Optimization. Springer, New York (2012)zbMATHGoogle Scholar
  59. [59]
    Yao, Y.: Granular computing and sequential three-way decisions. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds.) Proceedings of the International Conference on Rough Sets and Knowledge Technology (RSKT), LNCS, vol. 8171, pp. 16–27. Springer-Verlag Berlin Heidelberg (2013)CrossRefGoogle Scholar
  60. [60]
    Zhang, G.: Neural networks for classification: a survey. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 30(4), 451–462 (2000)CrossRefGoogle Scholar
  61. [61]
    Zhang, G., Huang, X., Li, S.Z., Wang, Y., Wu, X.: Boosting local binary pattern (LBP)-based face recognition. In: Li, S.Z., Lai, J., Tan, T., Feng, G., Wang, Y. (eds.) Proceedings of the International Conference on Advances in Biometric Person Authentication, LNCS, vol. 3338, pp. 179–186. Springer-Verlag Berlin Heidelberg (2005)CrossRefGoogle Scholar
  62. [62]
    Zhou, E., Cao, Z., Yin, Q.: Naive-deep face recognition: touching the limit of LFW benchmark or not? CoRR (2015). abs/1501.04690Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  • Andrey V. Savchenko
    • 1
  1. 1.Laboratory of Algorithms and Technologies for Network AnalysisNational Research University Higher School of EconomicsNizhny NovgorodRussia

Personalised recommendations