Support vector machine ensembles for discriminant analysis for ranking principal components

Abstract

The problemof ranking linear subspaces in principal component analysis (PCA), for multi-class classification tasks, has been addressed by building support vector machine (SVM) ensembles and AdaBoost.M2 technique. This methodology, named multi-class discriminant principal components analysis (Multi-Class.M2 DPCA), is motivated by the fact that the first PCA components do not necessarily represent important discriminant directions to separate sample groups. The Multi-Class.M2 DPCA proposal presents fundamental issues related to the weakening methodology, parametrization, strategy for SVM bias, and classification versus reconstruction performance. Also, it is observed a lack of comparisons between Multi-Class.M2 DPCA and feature weighting techniques. Motivated by these facts, this paper firstly presents a unified formulation to generate weakened SVM approaches and to derive different strategies of the literature. These strategies are analyzed within Multi-Class.M2 DPCA methodology and its parametrization to realize the best one for ranking PCA features in face image analysis. Moreover, this work proposes variants to improve that Multi-Class.M2 DPCA configuration using strategies that incorporate SVM bias and sensitivity analysis results. The obtained Multi-Class.M2 DPCA setups are applied in the computational experiments for both classification and reconstruction problems. The results show that Multi-Class.M2 DPCA achieves higher recognition rates using less PCA features, as well as robust reconstruction and interpretation of the data.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  1. 1.

    Ang JC, Mirzal A, Haron H, Hamed HNA (2016) Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM T Comput Biology Bioinform 13(5):971–989

    Google Scholar 

  2. 2.

    Baudat G, Anouar F (2000) Generalized discriminant analysis using a kernel approach. Neural Comput 12(10):2385–2404

    Google Scholar 

  3. 3.

    Brezhneva OA, Tret’yakov AA, Wright SE (2009) A simple and elementary proof of the karush–kuhn–tucker theorem for inequality-constrained optimization. Optim Lett 3:7–10

    MathSciNet  MATH  Google Scholar 

  4. 4.

    Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28

    Google Scholar 

  5. 5.

    Connor S, Khoshgoftaar TM (2019) A survey on image data augmentation for deep learning. J Big Data 6:1–48

    Google Scholar 

  6. 6.

    Cunningham JP, Ghahramani Z (2015) Linear dimensionality reduction: survey, insights, and generalizations. J Mach Learn Res 16:2859–2900

    MathSciNet  MATH  Google Scholar 

  7. 7.

    DING C, PENG H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinforma Comput Biol 03(02):185–205

    Google Scholar 

  8. 8.

    Díaz-Uriarte R, De Andrés SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinforma 7:3

    Google Scholar 

  9. 9.

    Dong Y, Zhang Z, Hong WC (2018) A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energies 11(04):1009

    Google Scholar 

  10. 10.

    Dorfer M, Kelz R, Widmer G (2015) Deep linear discriminant analysis. International Conference of Learning Representations (ICLR),arXiv:1511.04707

  11. 11.

    Duan S, Chen K, Yu X, Qian M (2018) Automatic multicarrier waveform classification via PCA and convolutional neural networks. IEEE Access 6:51365–51373

    Google Scholar 

  12. 12.

    Fang Y (2018) Feature selection, deep neural network and trend prediction. Journal of Shanghai Jiaotong University (Science) 23(2):297–307

    Google Scholar 

  13. 13.

    Ferreira A, Giraldi G (2017) Convolutional neural network approaches to granite tiles classification. Expert Syst Appl 84:1–11

    Google Scholar 

  14. 14.

    Filisbino TA, Giraldi GA, Thomaz CE (2015) Comparing ranking methods for tensor components in multilinear and concurrent subspace analysis with applications in face images. IJIG-International Journal of Image and Graphics, 15

  15. 15.

    Filisbino TA, Giraldi GA, Thomaz CE (2016) Ranking eigenfaces through adaboost and perceptron ensembles. In: Workshop on face processing applications. SIBGRAPI

  16. 16.

    Filisbino TA, Giraldi GA, Thomaz CE (2016) Approaches for multi-class discriminant analysis for ranking principal components. In: XII Workshop de Visao Computacional (WVC’16), Nov 2016

  17. 17.

    Filisbino TA, Giraldi GA, Thomaz CE (2016) Ranking principal components in face spaces through adaboost.m2 linear ensemble. In: Graphics, patterns and images (SIBGRAPI), 2016 26th SIBGRAPI Conference on, São Jose dos Campos, SP, Brazil, Octuber 2016

  18. 18.

    Filisbino TA, Giraldi GA, Thomaz CE (2017) Multi-class nonlinear discriminant feature analysis. In: 38th Ibero-Latin Am. Cong. on Comp. Meth. in Eng. (CILAMCE)

  19. 19.

    Filisbino TA, Giraldi GA, Thomaz CE, Barros BMN, da Silva MB (2017) Ranking texture features through adaboost.m2 linear ensembles for granite tiles classification. In: Xth EAMC, petropolis, Brazil, pp 1–3

  20. 20.

    Filisbino TA, Leite D, Giraldi GA, Thomaz CE (2015) Multi-class discriminant analysis based on svm ensembles for ranking principal components. In: 36th Ibero-latin am. cong. on comp. meth. in eng. (CILAMCE)

  21. 21.

    Filisbino TA, Thomaz CE, Giraldi GA (2017) Ranking tensor subspaces in weighted multilinear principal component analysis. Int J Pattern Recogn Artif Intell 31 (7):1–35

    MathSciNet  Google Scholar 

  22. 22.

    Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    MathSciNet  MATH  Google Scholar 

  23. 23.

    Garcia E, Lozano F (2007) Boosting Support Vector Machines. In: Proceedings of international conference of machine learning and data mining (MLDM’2007). IBal publishing, LeipzigGermany, pp 153–167

  24. 24.

    Garcia-Garcia A, Orts S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, García Rodríguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65

    Google Scholar 

  25. 25.

    Garg S, Kaur K, Kumar N, Kaddoum G, Zomaya AY, Ranjan R (2019) A hybrid deep learning-based model for anomaly detection in cloud datacenter networks. IEEE Trans Netw Serv Manag 16(3):924– 935

    Google Scholar 

  26. 26.

    Giraldi GA, Filisbino TA, Simao LB, Thomaz CE (2017) Combining deep learning and multi-class discriminant analysis for granite tiles classification. In: Proceedings of the XIII workshop de visao computacional, WVC 2017. Natal, Rio Grande do Norte, Brazil. Springer, Berlin, pp 19–24

  27. 27.

    Grigory A, Berrani SA, Ruchaud N, Dugelay JL (2015) Learned vs. hand-crafted features for pedestrian gender recognition. In: ACM Multimedia

  28. 28.

    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  29. 29.

    Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Machine Learning 46(1):389–422

    MATH  Google Scholar 

  30. 30.

    Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds.) (2006) Feature extraction: fundations and applications

  31. 31.

    Hall MA (2000) Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the seventeenth international conference on machine learning, ICML ’00. Morgan Kaufmann Publishers Inc, San Francisco, pp 359–366

  32. 32.

    Hastie T, Tibshirani R, Friedman JH (2001) The elements of statistical learning. Springer, Berlin

    Google Scholar 

  33. 33.

    Hira ZM, Gillies DF (2015) A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinformatics, 1–13

  34. 34.

    Hong WC, Li MW, Geng J, Zhang Y (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:03

    MathSciNet  MATH  Google Scholar 

  35. 35.

    Ioffe S (2006) Probabilistic linear discriminant analysis. Springer-Verlag, Berlin, pp 531–542

    Google Scholar 

  36. 36.

    Jovic A, Brkic K, Bogunovic N (2015) A review of feature selection methods with applications. In: 2015 38th International convention on information and communication technology, electronics and microelectronics (MIPRO), pp 1200–1205, May 2015

  37. 37.

    Koch P, Konen W (2013) Subsampling strategies in svm ensembles. In: Hoffmann F, Hüllermeier E (eds) Proceedings 23. workshop computational intelligence. Universitätsverlag Karlsruhe, pp 119–134

  38. 38.

    Kononenko I, Šimec E, Robnik-Šikonja M (1997) Overcoming the myopia of inductive learning algorithms with relieff. Appl Intell 7(1):39–55

    Google Scholar 

  39. 39.

    Lan Z, Yu SI, Lin M, Raj B, Hauptmann AG (2015) Handcrafted local features are convolutional neural networks. arXiv:1511.05045

  40. 40.

    Langner O, Dotsch R, Bijlstra G, Wigboldus DHJ, Hawk ST, Van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cogn Emot 24(8):1377–1388

    Google Scholar 

  41. 41.

    Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: a data perspective. ACM Comput Surv 50(6):94:1–94:45

    Google Scholar 

  42. 42.

    Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the ga/knn method. Bioinformatics 17(12):1131–1142

    Google Scholar 

  43. 43.

    Lu H, Plataniotis KN, Venetsanopoulos AN (2008) Mpca: multilinear principal component analysis of tensor objects. IEEE Trans Neural Netw 19(1):18–39

    Google Scholar 

  44. 44.

    Lundqvist D, Flykt A, Ohman A (1998) The karolinska directed emotional faces – kdef, cd rom from department of clinical neuroscience. Psychology section, Karolinska Institutet

  45. 45.

    Miglani A, Neeraj K (2019) Deep learning models for traffic flow prediction in autonomous vehicles: a review, solutions, and challenges. Veh Commun, 20

  46. 46.

    Mohri M, Rostamizadeh A, Talwalkar A (2012) Foundations of machine learning. The MIT press, Cambridge

    Google Scholar 

  47. 47.

    Park CH, Park H (2005) Nonlinear discriminant analysis using kernel functions and the generalized singular value decomposition. SIAM J Matrix Anal Appl 27(1):87–102

    MathSciNet  MATH  Google Scholar 

  48. 48.

    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE T Pattern Anal Mach Intell 27(8):1226–1238

    Google Scholar 

  49. 49.

    Safavi H, Chang CI (2008) Projection pursuit-based dimensionality reduction. Proc SPIE 6966(69661H):11

    Google Scholar 

  50. 50.

    Scholkopf B, Smola A, Muller KR (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10(5):1299–1319

    Google Scholar 

  51. 51.

    Seuret M, Alberti M, Liwicki M, Ingold R (2017) Pca-initialized deep neural networks applied to document image analysis. In: ICDAR. IEEE, Piscataway, pp 877–882

  52. 52.

    Sheela A, Prasad S (2007) Linear discriminant analysis f-ratio for optimization of tespar & mfcc features for speaker recongnition. Journal of Multimedia

  53. 53.

    Shieh MD, Yang CC (2008) Multiclass svm-rfe for product form feature selection. Expert Syst Appl 35(1-2):531–541

    Google Scholar 

  54. 54.

    Stuhlsatz A, Lippel J, Zielke T (2012) Feature extraction with deep neural networks by a generalized discriminant analysis. IEEE Neural Netw Learning Syst 23:596–608

    Google Scholar 

  55. 55.

    Swets D, Weng J (1996) Using discriminants eigenfeatures for image retrieval. IEEE Trans Patterns Anal Mach Intell 18(8):831–836

    Google Scholar 

  56. 56.

    Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. In: Data classification: algorithms and applications

  57. 57.

    Tao Q, Wu GW, Wang FY, Wang J (2005) Posterior probability support vector machines for unbalanced data. IEEE Trans Neural Netw 16(6):1561–1573

    Google Scholar 

  58. 58.

    Terzopoulos D, Vasilescu MAO (2002) Multilinear analysis of image ensembles: Tensorfaces 447/460

  59. 59.

    Thomaz CE, Giraldi GA (2010) A new ranking method for principal components analysis and its application to face image analysis. Image Vision Comput 28(6):902–913

    Google Scholar 

  60. 60.

    Thomaz CE, Kitani EC, Gillies DF (2006) A maximum uncertainty lda-based approach for limited sample size problems - with application to face recognition. J Braz Comput Soc 12(2):7–18

    Google Scholar 

  61. 61.

    Vapnik V (1998) Statistical learning theory. Wiley, New York

    Google Scholar 

  62. 62.

    Wickramaratna J, Holden S, Bernard B (2001) Performance Degradation in Boosting. Springer, Berlin, pp 11–21

    Google Scholar 

  63. 63.

    Wu L, Shen C, Van Den Hengel A (2017) Deep linear discriminant analysis on fisher networks: a hybrid architecture for person re-identification. Pattern Recogn 65:238–250

    Google Scholar 

  64. 64.

    Yang P, Zhou BB, Zhang Z, Zomaya AY (2010) A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data. BMC Bioinforma 11(1):S5

    Google Scholar 

  65. 65.

    Yildizer E, Balci AM, Hassan M, Alhajj R (2012) Efficient content-based image retrieval using multiple support vector machines ensemble. Expert Syst Appl 39:2385–2396

    Google Scholar 

  66. 66.

    Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98:09

    Google Scholar 

  67. 67.

    Zheng YF (2005) One-against-all multi-class svm classification using reliability measures. In: Proceedings 2005 IEEE international joint conference on neural networks, vol 2, pp 849–854

  68. 68.

    Zhou ZH (2012) Ensemble methods: foundations and algorithms 1st edition

  69. 69.

    Zhou Y, Sun S (2017) Manifold partition discriminant analysis. IEEE Trans Cybern 47(4):830–840

    Google Scholar 

  70. 70.

    Zhou N, Wang L (2007) A modified t-test feature selection method and its application on the hapmap genotype data. Genom Proteom Bioinf 5(3-4):242–249

    Google Scholar 

  71. 71.

    Zhu M, Martinez A (2006) Selecting principal components in a two-stage lda algorithm. IEEE Comput Soc Conf Comput Vis Pattern Recognit (CVPR’06) 1:132–137

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tiene A. Filisbino.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Filisbino, T.A., Giraldi, G.A. & Thomaz, C.E. Support vector machine ensembles for discriminant analysis for ranking principal components. Multimed Tools Appl (2020). https://doi.org/10.1007/s11042-020-09187-9

Download citation

Keywords

  • PCA
  • Ranking PCA components
  • Separating hyperplanes
  • Ensemble methods
  • AdaBoost
  • Face image analysis