Applied Intelligence

, Volume 49, Issue 7, pp 2780–2792 | Cite as

Hierarchical feature extraction based on discriminant analysis

  • Xinxin Liu
  • Hong ZhaoEmail author


Feature extraction is a crucial technique for data preprocessing in classification tasks such as protein classification and image classification. Datasets with tree class hierarchies have become extremely common in many practical classification tasks. However, existing flat feature extraction algorithms tend to assume that classes are independent and ignore the hierarchical information of class structure within a dataset. In this paper, we propose a hierarchical feature extraction algorithm based on discriminant analysis (HFEDA). HFEDA first decomposes the highly complex feature extraction problem into smaller problems by creating sub-datasets for non-leaf nodes according to the tree class hierarchy of dataset. Secondly, different from flat algorithms, HFEDA takes the hierarchical class structure into account in dimensionality reduction process, and calculates the projection matrices for the non-leaf nodes in the tree class hierarchy. In this way, HFEDA can just focus on discriminating the several categories under the same parent node. Finally, HFEDA does not need to determine the optimal feature subset size, which is challenging for most feature selection algorithms. Extensive experiments on different type datasets and typical classifiers demonstrate the effectiveness and efficiency of the proposed algorithm.


Class hierarchy Discriminant analysis Feature extraction Hierarchical classification 



This work was supported by the National Natural Science Foundation of China under Grant No. 61703196 and the Natural Science Foundation of Fujian Province under Grant No. 2018J01549.


  1. 1.
    Adeli A, Broumandnia A (2018) Image steganalysis using improved particle swarm optimization based feature selection. Appl Intell 48(6):1609–1622Google Scholar
  2. 2.
    Baranauskas JA, Netto OP, Nozawa SR, Macedo AA (2018) A tree-based algorithm for attribute selection. Appl Intell 48(4):821–833Google Scholar
  3. 3.
    Bazan JG, Bazan-Socha S, Buregwa-Czuma S, Dydo L, Rzasa W, Skowron A (2016) A classifier based on a decision tree with verifying cuts. Fund Inform 1269(1–2):13–21MathSciNetGoogle Scholar
  4. 4.
    Cai D, He X, Han J (2007) SRDA: An efficient algorithm for large-scale discriminant analysis. IEEE Trans Knowl Data Eng 20(1):1–12Google Scholar
  5. 5.
    Cai Z, Zhu W (2018) Multi-label feature selection via feature manifold learning and sparsity regularization. Int J Mach Learn Cybern 9(8):1321C–1334Google Scholar
  6. 6.
    Cui Z, Shan S, Zhang H, Lao S, Chen X (2013) Structured sparse linear discriminant analysis. In: IEEE International conference on image processing, pp 1161–1164Google Scholar
  7. 7.
    Dai J, Hu Q, Zhang J, Hu H, Zheng N (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern PP(99):1–12Google Scholar
  8. 8.
    Dekel O, Keshet J, Singer Y (2004) Large margin hierarchical classification. In: International conference on machine learning, pp 27–36Google Scholar
  9. 9.
    Ding C, Dubchak I (2001) Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17(4):349–358Google Scholar
  10. 10.
    Escalante HJ, Hernandez CA, Gonzalez JA, Lopez-Lopez A, Montes M, Morales EF, Sucar LE, Villaseor L, Grubinger M (2010) The segmented and annotated IAPR TC-12 benchmark. Comput Vis Image Underst 114(4):419–428Google Scholar
  11. 11.
    Fabris F, Freitas AA, Tullet JMA (2016) An extensive empirical comparison of probabilistic hierarchical classifiers in datasets of ageing-related genes. IEEE/ACM Trans Comput Biol Bioinform PP(99):1045–1058Google Scholar
  12. 12.
    Gomez JC, Moens MF (2012) Hierarchical classification of web documents by stratified discriminant analysis. In: Conference on multidisciplinary information retrieval, pp 94–108Google Scholar
  13. 13.
    Grimaudo L, Mellia M, Baralis E (2012) Hierarchical learning for fine grained internet traffic classification. In: International wireless communications and mobile computing conference, pp 463–468Google Scholar
  14. 14.
    Harandi MT, Ahmadabadi MN, Araabi BN (2009) Optimal local basis: a reinforcement learning approach for face recognition. Int J Comput Vis 81(2):191–204Google Scholar
  15. 15.
    Howland P, Park H (2004) Generalizing discriminant analysis using the generalized singular value decomposition. IEEE Trans Pattern Anal Mach Intell 26(8):995–1006Google Scholar
  16. 16.
    Jia D (2012) Hedging your bets: optimizing accuracy-specificity trade-offs in large scale visual recognition. In: IEEE Conference on computer vision and pattern recognition, pp 3450– 3457Google Scholar
  17. 17.
    Jr CNS, Freitas AA (2011) A survey of hierarchical classification across different application domains. Data Min Knowl Disc 22(1-2):31–72MathSciNetzbMATHGoogle Scholar
  18. 18.
    Knight J, Ivanov I, Triff K, Chapkin R, Dougherty E (2015) Detecting multivariate gene interactions in RNA-seq data using optimal Bayesian classification. IEEE/ACM Trans Comput Biol Bioinform 84(99):484–493Google Scholar
  19. 19.
    Kosmopoulos A, Partalas I, Gaussier E, Paliouras G, Androutsopoulos I (2015) Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min Knowl Disc 29(3):820–865MathSciNetzbMATHGoogle Scholar
  20. 20.
    Krause J, Stark M, Jia D, Li F (2013) 3D object representations for fine-grained categorization. In: IEEE International conference on computer vision workshops, pp 554–561Google Scholar
  21. 21.
    Krizhevsky A (2009) Learning multiple layers of features from tiny images. Technical Report, Department of Computer Science, University of TorontoGoogle Scholar
  22. 22.
    Kumar N, Singh S, Kumar A (2018) Random permutation principal component analysis for cancelable biometric recognition. Appl Intell 48(9):2824–2836Google Scholar
  23. 23.
    Li J, Kumar CA, Mei C, Wang X (2017) Comparison of reduction in formal decision contexts. Int J Approx Reason 80:100–122MathSciNetzbMATHGoogle Scholar
  24. 24.
    Lin C, Zou Y, Qin J, Liu X, Jiang Y, Ke C, Zou Q (2013) Hierarchical classification of protein folds using a novel ensemble classifier. Plos One 8(2):e56499Google Scholar
  25. 25.
    Min HK, Hou Y, Park S, Song I (2016) A computationally efficient scheme for feature extraction with kernel discriminant analysis. Pattern Recogn 50(C):45–55zbMATHGoogle Scholar
  26. 26.
    Mohammed AA, Minhas R, Wu QMJ, Sid-Aahmed MA (2011) Human face recognition based on multidimensional PCA and extreme learning machine. Pattern Recogn 44(10):2588– 2597zbMATHGoogle Scholar
  27. 27.
    Ou J, Li Y, Shen C (2018) Unlabeled PCA-shuffling initialization for convolutional neural networks. Appl Intell 48(12):4565–4576Google Scholar
  28. 28.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238Google Scholar
  29. 29.
    Qu Y, Lin L, Shen F, Lu C, Wu Y, Xie Y, Tao D (2017) Joint hierarchical category structure learning and large-scale image classification. IEEE Trans Image Process PP(99):1–16MathSciNetzbMATHGoogle Scholar
  30. 30.
    Shao Y, Sang N, Gao C, Ma L (2018) Spatial and class structure regularized sparse representation graph for semi-supervised hyperspectral image classification. Pattern Recogn 81:1–14Google Scholar
  31. 31.
    Shi L, Liu W, Li Z (2017) Feature extraction method of lung sound based on LDA and wavelet decomposition. Comput Eng Appl 53(22):116–120Google Scholar
  32. 32.
    Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14Google Scholar
  33. 33.
    Tang B, Kay S, He H (2016) Toward optimal feature selection in naive Bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521Google Scholar
  34. 34.
    Valencia-Cabrera L, Orellana-Martin D, Martinez-Del-Amor MA, Riscos-Nunez A, Perez-Jimenez MJ (2017) Cooperation in transport of chemical substances: a complexity approach within membrane computing. Fundamenta Informaticae 154(1–4):373–385MathSciNetzbMATHGoogle Scholar
  35. 35.
    Wang C, Hu Q, Wang X, Chen D, Qian Y, Zhe D (2017) Feature selection based on neighborhood discrimination index. IEEE Trans Neural Netw Learn Syst PP(99):1–14Google Scholar
  36. 36.
    Wang H, Fan Y, Fang B, Dai S (2018) Generalized linear discriminant analysis based on euclidean norm for gait recognition. Int J Mach Learn Cybern 9(4):569–576Google Scholar
  37. 37.
    Wang M, Zhang E, Tang Z, Xu H (2015) Voice activity detection based on Fisher linear discriminant analysis. J Electron Inf Technol 37(6):1343–1349Google Scholar
  38. 38.
    Wang S, Rao RV, Chen P, Zhang Y, Liu A, Wei L (2017) Abnormal breast detection in mammogram images by feed-forward neural network trained by Java algorithm. Fund Inform 151(1–4):191–211MathSciNetGoogle Scholar
  39. 39.
    Wang S, Wang H (2017) Unsupervised feature selection via low-rank approximation and structure learning. Knowl-Based Syst 124:70–79Google Scholar
  40. 40.
    Wang S, Zhu W (2018) Sparse graph embedding unsupervised feature selection. IEEE Trans Syst Man Cybern Syst 48(3):329–341Google Scholar
  41. 41.
    Wang Y, Hu Q, Zhou Y, Zhao H, Qian Y, Liang J (2017) Local Bayes risk minimization based stopping strategy for hierarchical classification. In: IEEE International conference on data mining, pp 515–524Google Scholar
  42. 42.
    Wei L, Liao M, Gao X, Zou Q (2015) Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans Nanobioscience 14(6):649–659Google Scholar
  43. 43.
    Wen LY, Min F (2015) A granular computing approach to symbolic value partitioning. Fund Inform 142(1–4):337–371MathSciNetzbMATHGoogle Scholar
  44. 44.
    Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107Google Scholar
  45. 45.
    Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition, pp 3485–3492Google Scholar
  46. 46.
    Ye J, Janardan R, Li Q (2009) Two-dimensional linear discriminant analysis. Adv Neural Inf Proces Syst 17(6):1431– 1441Google Scholar
  47. 47.
    Zhao H, Wang P, Hu Q (2016) Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci 366:134–149MathSciNetGoogle Scholar
  48. 48.
    Zhao H, Zhu P, Wang P, Hu Q (2017) Hierarchical feature selection with recursive regularization. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp 3483–3489Google Scholar
  49. 49.
    Zheng L, Wang H, Gao S (2018) Sentimental feature selection for sentiment analysis of Chinese online reviews. Int J Mach Learn Cybern 9(1):75–84Google Scholar
  50. 50.
    Zhu P, Hu Q, Zhang C, Zuo W (2016) Coupled dictionary learning for unsupervised feature selection. In: Proceedings of the 13th AAAI conference on artificial intelligence, pp 2422– 2428Google Scholar
  51. 51.
    Zhu P, Zuo W, Zhang L, Hu Q, Shiu SCK (2015) Unsupervised feature selection by regularized self-representation. Pattern Recogn 48(2):438–446zbMATHGoogle Scholar
  52. 52.
    Zhu W (2009) Relationship between generalized rough sets based on binary relation and covering. Inf Sci 179(3):210–225MathSciNetzbMATHGoogle Scholar
  53. 53.
    Zhuo W, Gu S, Xu X (2018) GSLDA: LDA-based group spamming detection in product reviews. Appl Intell 48(9):3094–3107Google Scholar
  54. 54.
    Zou Q, Ju Y, Li D (2016) Protein folds prediction with hierarchical structured SVM. Current Proteomics 13(2):79–85Google Scholar
  55. 55.
    Zou Q, Zeng J, Cao L, Ji R (2016) A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing 173:346–354Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Fujian Key Laboratory of Granular Computing and ApplicationMinnan Normal UniversityZhangzhouChina
  2. 2.School of Computer ScienceMinnan Normal UniversityZhangzhouChina

Personalised recommendations