Semi-supervised Learning Algorithm Based on Linear Lie Group for Imbalanced Multi-class Classification

Abstract

In practical application, the data are imbalanced, it is difficult to find the balanced, rather skewed data is the common occurrence. This poses a severe challenge to the classification algorithm. At present, imbalanced data classification methods are mainly for binary classes designed, and it is difficult to extend them to multiple classes. In this study, we introduced Lie group machine learning and proposed a semi-supervised learning algorithm based on the linear Lie group. First, the sample set is represented by a matrix, the isomorphism(or homomorphism)-GL(n) linear Lie group of the corresponding learning system is found, and the labeled data are used to represent the object to be learned by linear Lie group. Then, according to the algebraic structure of the linear Lie group, it is marked by the group method. We performed experiments on 18 benchmark multi-class imbalanced datasets to demonstrate the performance of our proposed method and measured the performance of multi-class imbalanced data using four state-of-the-art learning algorithms (mean of accuracy, mean of f-measure, and mean of area under the curve). The experimental results demonstrate that the proposed method is effective and improves the performance.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. 1.

    Ao X et al (2014) Combining supervised and unsupervised models via unconstrained probabilistic embedding. Inf Sci 257:101–114

    Article  Google Scholar 

  2. 2.

    Basu S, Banerjee A, Mooney R (2002) “Semi-supervised clustering by seeding.” In: Proceedings of 19th international conference on machine learning (ICML-2002)

  3. 3.

    Basu S, Bilenko M, Mooney RJ (2003) “Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering.” Proceedings of the ICML-2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining

  4. 4.

    Bennett KP, Demiriz A (1999) “Semi-supervised support vector machines.” Advances in Neural Information processing system

  5. 5.

    Bi J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl Based Syst 158:81–93

    Article  Google Scholar 

  6. 6.

    Cai W, Chen S, Zhang D (2009) A simultaneous learning framework for clustering and classification. Pattern Recognit 42(7):1248–1259

    Article  Google Scholar 

  7. 7.

    Chawla NV, Karakoulas G (2005) Learning from labeled and unlabeled data: an empirical study across techniques and domains. J Artif Intell Res 23:331–366

    Article  Google Scholar 

  8. 8.

    Olivier C, Vikas S, Sathiya SK (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233

    MATH  Google Scholar 

  9. 9.

    Chawla NV et al (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  10. 10.

    Coates A, Ng A, Lee H (2011) “An analysis of single-layer networks in unsupervised feature learning.” In: Proceedings of the fourteenth international conference on artificial intelligence and statistics

  11. 11.

    Zhang C, Cheng J, Tian Q (2019) Unsupervised and semi-supervised image classification with weak semantic consistency. IEEE Trans Multimed 21(10):2482–2491

    Article  Google Scholar 

  12. 12.

    Yu D, Varadarajan B, Deng L, Acero A (2010) Active learning and semi-supervised learning for speech recognition: a unified framework using the global entropy reduction maximization criterion. Comput Speech Lang 24(3):433–444

    Article  Google Scholar 

  13. 13.

    Fernández-Navarro F, Hervás-Martínez C, Gutiérrez PA (2011) A dynamic over-sampling procedure based on sensitivity for multi-class problems. Pattern Recogn 44(8):1821–1833

    Article  Google Scholar 

  14. 14.

    Forestier G, Wemmert C (2016) Semi-supervised learning using multiple clusterings with limited labeled data. Inf Sci 361:48–65

    Article  Google Scholar 

  15. 15.

    Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    Article  Google Scholar 

  16. 16.

    Roli F, Marcialis GL (2006) Semi-supervised PCA-based face recognition using self-training. Pro Joint IAPR Int Workshops Struct Syntactic Stat Pattern Recognit 4109:560–568

    Article  Google Scholar 

  17. 17.

    García S et al (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445:22–3

    MathSciNet  Article  Google Scholar 

  18. 18.

    Ghanem AS, Venkatesh S, West G (2010) “Multi-class pattern classification in imbalanced data.” In: 2010 20th international conference on pattern recognition. IEEE

  19. 19.

    Grira N, Crucianu M, Boujemaa N (2008) Active semi-supervised fuzzy clustering. Pattern Recogn 41(5):1834–1844

    Article  Google Scholar 

  20. 20.

    Haixiang G et al (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  21. 21.

    Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186

    Article  Google Scholar 

  22. 22.

    Hart PE (1973) Pattern classification and scene analysis, vol 3. Wiley, New York

    Google Scholar 

  23. 23.

    Gan H, Sang N, Huang R (2014) Self-training-based face recognition using semi-supervised linear discriminant analysis and affinity propagation. J Opt Soc Am A Opt Image Sci 31(1):1–6

    Article  Google Scholar 

  24. 24.

    Holm S (1979) “A simple sequentially rejective multiple test procedure.” Scandinavian journal of statistics. pp 65–70

  25. 25.

    Li K et al (2009) “A novel semi-supervised fuzzy c-means clustering method.” 2009 Chinese Control and Decision Conference. IEEE

  26. 26.

    Joachims T (1999) “Transductive inference for text classification using support vector machines.” Icml. Vol. 99

  27. 27.

    Liu X-Y, Li Q-Q, Zhou Z-H (2013) “Learning imbalanced multi-class data with optimal dichotomy weights.” In: 2013 IEEE 13th international conference on data mining. IEEE

  28. 28.

    Mai DS, Ngo LT (2015) “Semi-supervised fuzzy C-means clustering for change detection from multispectral satellite image.” In: 2015 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE

  29. 29.

    Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  30. 30.

    Melacci S, Belkin M (2011) “Laplacian support vector machines trained in the primal.” J Mach Learn Res 12:1149-1184

  31. 31.

    Ng AY, Jordan MI (2002) “On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes.” Advances in neural information processing systems

  32. 32.

    Chapelle O, Schlkopf B, Zien A (2013) Semi-supervised learning in handbook on neural information processing. Springer, Berlin

  33. 33.

    Mallapragada PK, Jin R, Jain AK, Liu Y (2009) SemiBoost: boosting for semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 31:2000–2014

    Article  Google Scholar 

  34. 34.

    Qi Z, Tian Y, Shi Y (2012) Laplacian twin support vector machine for semi-supervised classification. Neural Netw 35:46–53

    Article  Google Scholar 

  35. 35.

    Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: a small target detection benchmark. J Vis Commun Image Represent 34:187–203

    Article  Google Scholar 

  36. 36.

    Riaz S, Arshad A, Jiao L (2018) Fuzzy rough C-mean based unsupervised CNN clustering for large-scale image data. Appl Sci 8(10):1869

    Article  Google Scholar 

  37. 37.

    Riaz Saman, Arshad Ali, Jiao Licheng (2019) “Rough-KNN noise-filtered convolutional neural network for image classification.” In: Proceedings 3rd international conference information technology intelligence transportation system (ITITS). Vol. 314

  38. 38.

    Riaz S, Arshad A, Jiao L (2018) Rough noise-filtered easy ensemble for software fault prediction. IEEE Access 6:46886–46899

    Article  Google Scholar 

  39. 39.

    Sáez JA, Krawczyk B, Woźniak M (2016) Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn 57:164–178

    Article  Google Scholar 

  40. 40.

    Van Vaerenbergh S, Santamaria I, Barbano PE (2011) Semi-supervised handwritten digit recognition using very few labeled data. In: Proceedings of the IEEE international conference acoustics speech signal process 7882:2136–2139

  41. 41.

    Tang Feng et al (2007) “Co-tracking using semi-supervised support vector machines.” In: 2007 IEEE 11th international conference on computer vision. IEEE

  42. 42.

    UCI Repository of Machine Learning Databases Aug (2018) [online] Available: http://www.ics.uci.edu/mlearn/MLRepository.html

  43. 43.

    Vluymans S et al (2016) Fuzzy rough classifiers for class imbalanced multi-instance data. Pattern Recogn 53:36–45

    Article  Google Scholar 

  44. 44.

    Vluymans S et al (2018) Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition: a fuzzy rough set approach. Knowl Inf Syst 56(1):55–84

    Article  Google Scholar 

  45. 45.

    Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern Part B 42(4):1119–1130

    Article  Google Scholar 

  46. 46.

    Wang S, Chen H, Yao X (2010) “Negative correlation learning for classification ensembles.” In: The 2010 international joint conference on neural networks (IJCNN). IEEE

  47. 47.

    Wilcoxon F (1992) Individual comparisons by ranking methods. Breakthroughs in statistics. Springer, New York, pp 196–202

    Google Scholar 

  48. 48.

    Liu X-Y, Wu J, Zhou Z-H (2006) “Exploratory undersampling for class-imbalance learning.” In; Proceedings of the international conference data mining (ICDM), pp. 539-550

  49. 49.

    Zhu X (2008) “Semi-supervised learning literature survey”

  50. 50.

    Cao Y, He H, Huang H (2011) LIFT: A new framework of learning from testing data for face recognition. Neurocomputing 74(6):916–929

    Article  Google Scholar 

  51. 51.

    Kong Y, Ni D (2020) A semi-supervised and incremental modeling framework for wafer map classification. IEEE Trans Semicond Manuf 33(1):62–71

    Article  Google Scholar 

  52. 52.

    YU J, RUI Y, TAO D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019-–2032

    MathSciNet  Article  Google Scholar 

  53. 53.

    YU J, TAO D, WANG M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272

    MathSciNet  Article  Google Scholar 

  54. 54.

    YU J et al (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779

    Article  Google Scholar 

  55. 55.

    Yu J et al (2019) “Hierarchical deep click feature prediction for fine-grained image recognition.” IEEE transactions on pattern analysis and machine intelligence

  56. 56.

    Yu H et al (2013) “Recognition of multiple imbalanced cancer types based on DNA microarray data using ensemble classifiers.” BioMed research international 2013

  57. 57.

    Zhao X-M et al (2008) Protein classification with imbalanced data. Proteins Struct Funct Bioinf 70(4):1125–1132

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank two anonymous reviewers for carefully reviewing this letter and giving valuable comments to improve this paper.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Chengjun Xu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xu, C., Zhu, G. Semi-supervised Learning Algorithm Based on Linear Lie Group for Imbalanced Multi-class Classification. Neural Process Lett (2020). https://doi.org/10.1007/s11063-020-10287-8

Download citation

Keywords

  • Lie group
  • Lie group machine learning
  • Semi-supervised learning
  • Imbalanced data
  • Multi-class classification