Sparse Feature Learning Using Ensemble Model for Highly-Correlated High-Dimensional Data

Braytee, Ali; Anaissi, Ali; Kennedy, Paul J.

doi:10.1007/978-3-030-04182-3_37

Ali Braytee¹⁶,
Ali Anaissi¹⁸ &
Paul J. Kennedy¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Included in the following conference series:

International Conference on Neural Information Processing

2198 Accesses
2 Citations

Abstract

High-dimensional highly correlated data exist in several domains such as genomics. Many feature selection techniques consider correlated features as redundant and therefore need to be removed. Several studies investigate the interpretation of the correlated features in domains such as genomics, but investigating the classification capabilities of the correlated feature groups is a point of interest in several domains. In this paper, a novel method is proposed by integrating the ensemble feature ranking and co-expression networks to identify the optimal features for classification. The main advantage of the proposed method lies in the fact, that it does not consider the correlated features as redundant. But, it shows the importance of the selected correlated features to improve the performance of classification. A series of experiments on five high dimensional highly correlated datasets with different levels of imbalance ratios show that the proposed method outperformed the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ambroise, C., McLachlan, G.J.: Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS 99(10), 6562–6566 (2002)
Article Google Scholar
Anaissi, A., Goyal, M., Catchpoole, D.R., Braytee, A., Kennedy, P.J.: Ensemble feature learning of genomic data using support vector machine. PLOS ONE 11(6), 1–17 (2016)
Article Google Scholar
Bin, Z., Steve, H.: A general framework for weighted gene co-expression network analysis. Stat. Appl. Gen. Mol. Biol. 4(1), 11–28 (2005)
MathSciNet MATH Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, Pittsburgh (1992)
Google Scholar
Braytee, A., Liu, W., Kennedy, P.J.: Supervised context-aware non-negative matrix factorization to handle high-dimensional high-correlated imbalanced biomedical data. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 4512–4519. IEEE, Anchorage (2017)
Google Scholar
Conn, D., Ngun, T., Li, G., Ramirez, C.: Fuzzy forests: extending random forests for correlated, high-dimensional data. UCLA Biostatistics Working Paper Series (2015)
Google Scholar
Cui, C., Wang, D.: High dimensional data regression using lasso model and neural networks with random weights. Inf. Sci. 372, 505–517 (2016)
Article Google Scholar
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)
Article Google Scholar
Huang, H.H., Liu, X.Y., Liang, Y.: Feature selection and cancer classification via sparse logistic regression with the hybrid L1/2 + 2 regularization. PLOS ONE 11(5), 1–15 (2016)
Google Scholar
Meier, L., Van De Geer, S., Bühlmann, P.: The group LASSO for logistic regression. J. R. Stat. Soc. Seri. B (Stat. Methodol.) 70(1), 53–71 (2008)
Article MathSciNet Google Scholar
Park, M.Y., Hastie, T., Tibshirani, R.: Averaged gene expressions for regression. Biostatistics 8(2), 212–227 (2007)
Article Google Scholar
Rapaport, F., Barillot, E., Vert, J.P.: Classification of arrayCGH data using fused SVM. Bioinformatics 24(13), i375–i382 (2008)
Article Google Scholar
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Article Google Scholar
Shipp, M.A., et al.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8(1), 68–74 (2002)
Article Google Scholar
Singh, D., et al.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2), 203–209 (2002)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Ser. B (Methodological) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Tolosi, L., Lengauer, T.: Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27(14), 1986–1994 (2011)
Article Google Scholar
Van De Vijver, M.J., et al.: A gene-expression signature as a predictor of survival in breast cancer. New Engl. J. Med. 347(25), 1999–2009 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Biomedical Engineering, University of Technology Sydney, Ultimo, Australia
Ali Braytee
School of Software, University of Technology Sydney, Ultimo, Australia
Paul J. Kennedy
The University of Sydney, Sydney, NSW, 2006, Australia
Ali Anaissi

Authors

Ali Braytee
View author publications
You can also search for this author in PubMed Google Scholar
Ali Anaissi
View author publications
You can also search for this author in PubMed Google Scholar
Paul J. Kennedy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Braytee .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Braytee, A., Anaissi, A., Kennedy, P.J. (2018). Sparse Feature Learning Using Ensemble Model for Highly-Correlated High-Dimensional Data. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-04182-3_37
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics