Identifying Characteristic Genes and Clustering via an Lp-Norm Robust Feature Selection Method for Integrated Data

Wu, Sha-Sha; Hou, Mi-Xiao; Liu, Jin-Xing; Wang, Juan; Yuan, Sha-Sha

doi:10.1007/978-3-319-95933-7_51

Sha-Sha Wu¹⁶,
Mi-Xiao Hou¹⁶,
Jin-Xing Liu¹⁶,
Juan Wang¹⁶ &
…
Sha-Sha Yuan¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10955))

Included in the following conference series:

International Conference on Intelligent Computing

2206 Accesses
1 Citations

Abstract

In bioinformatics, feature selection is a good method for dimensionality reduction and has been widely used. However, the model of traditional feature selection method: Joint Embedding Learning and Sparse Regression (JELSR), whose the error term is in the form of a square term, which leads to the algorithm becoming extremely sensitive to noise and outliers and degrading the performance of the algorithm. Considering the above problem, we propose a new robust feature selection model by adding an L_p-norm constraint on error term, and name it as RJELSR, which improves the robustness of the algorithm. And we give an efficacious optimization strategy based on the augmented Lagrange multiplier method to get the optimal results. In the experimental section, we first preprocess different cancer data to obtain the integrated data, and then apply it to our algorithm for feature selection and sample clustering. Experiments on integrated data demonstrate that the performance of our method is superior to other compared methods and the selected characteristic genes are more biologically meaningful.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, W., Zheng, R., Baade, P.D., Zhang, S., Zeng, H., Bray, F., Jemal, A., Yu, X.Q., He, J.: Cancer statistics in China, 2015. CA Cancer J. Clin. 66(2), 115 (2016)
Article Google Scholar
Reis-Filho, J.S.: Next-generation sequencing. J. Biomed. Biotechnol. 11(S3), S12 (2009)
Google Scholar
D’Addabbo, A., et al.: SVD based feature selection and sample classification of proteomic data. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5179, pp. 556–563. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85567-5_69
Chapter Google Scholar
Zheng, C.H., Yang, W., Chong, Y.W., Xia, J.F.: Identification of mutated driver pathways in cancer using a multi-objective optimization model. Comput. Biol. Med. 72, 22–29 (2016)
Article Google Scholar
Liu, J.X., Xu, Y., Zheng, C.H., Kong, H., Lai, Z.H.: RPCA-based tumor classification using gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(4), 964–970 (2015)
Article Google Scholar
Krzanowski, W.J.: Selection of variables to preserve multivariate data structure, using principal components. J. R. Stat. Soc. 36(1), 22–33 (1987)
Google Scholar
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548 (2011)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2006)
Article Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: International Conference on Neural Information Processing Systems, pp. 507–514 (2006)
Google Scholar
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning, pp. 1151–1157 (2007)
Google Scholar
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)
Google Scholar
Zhao, Z., Wang, L., Liu, H.: Efficient spectral feature selection with minimum redundancy. In: Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, pp. 11–15, July 2011
Google Scholar
Hou, C., Nie, F., Li, X., Yi, D., Wu, Y.: Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans. Cybern. 44(6), 793 (2014)
Article Google Scholar
Nie, F., Huang, H., Ding, C.: Low-rank matrix recovery via efficient schatten p-norm minimization. In: Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 655–661 (2012)
Google Scholar
Nie, F., Wang, H., Huang, H., Ding, C.: Joint schatten p-norm and ℓp-norm robust matrix completion for missing value recovery. Knowl. Inf. Syst. 42(3), 525–544 (2015)
Article Google Scholar
Chen, M., Lin, Z., Ma, Y., Wu, L.: The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Eprint Arxiv, vol. 9 (2010)
Google Scholar
Liu, J., Liu, J.X., Gao, Y.L., Kong, X.Z., Wang, X.S., Wang, D.: A p-norm robust feature extraction method for identifying differentially expressed genes. PLoS ONE 10(7), e0133124 (2015)
Article Google Scholar
Shang, R., Wang, W., Stolkin, R., Jiao, L.: Non-negative spectral learning and sparse regression-based dual-graph regularized feature selection. IEEE Trans. Cybern. PP(99), 1–14 (2017)
Article Google Scholar
Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60(11), 5810–5819 (2012)
Article MathSciNet Google Scholar
Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint ℓ2,1-norms minimization. In: Advances in Neural Information Processing Systems, pp. 1813–1821 (2010)
Google Scholar
Gold, P., Freedman, S.O.: Specific carcinoembryonic antigens of the human digestive system. J. Exp. Med. 122(3), 467–481 (1965)
Article Google Scholar
Gebauer, F., Wicklein, D., Horst, J., Sundermann, P., Maar, H., Streichert, T., Tachezy, M., Izbicki, J.R., Bockhorn, M., Schumacher, U.: Carcinoembryonic antigen-related cell adhesion molecules (CEACAM) 1, 5 and 6 as biomarkers in pancreatic cancer. PLoS ONE 9(11), e113023 (2014)
Article Google Scholar
Blumenthal, R.D., Leon, E., Hansen, H.J., Goldenberg, D.M.: Expression patterns of CEACAM5 and CEACAM6 in primary and metastatic cancers. BMC Cancer 7(1), 2 (2007)
Article Google Scholar
Choudhury, A., Moniaux, N., Winpenny, J.P., Hollingsworth, M.A., Aubert, J.P., Batra, S.K.: Human MUC4 mucin cDNA and its variants in pancreatic carcinoma. J. Biochem. 128(2), 233–243 (2000)
Article Google Scholar
Lópezferrer, A., Alameda, F., Barranco, C., Garrido, M., De, B.C.: MUC4 expression is increased in dysplastic cervical disorders. Hum. Pathol. 32(11), 1197–1202 (2001)
Article Google Scholar
Huang, J., Nie, F., Huang, H.: A new simplex sparse learning model to measure data similarity for clustering. In: International Conference on Artificial Intelligence, pp. 3569–3575 (2015)
Google Scholar

Download references

Acknowledgement

This work was supported in part by the NSFC under grant Nos. 61572284, 61502272, and 61701279.

Author information

Authors and Affiliations

School of Information Science and Engineering, Qufu Normal University, Rizhao, 276826, China
Sha-Sha Wu, Mi-Xiao Hou, Jin-Xing Liu, Juan Wang & Sha-Sha Yuan

Authors

Sha-Sha Wu
View author publications
You can also search for this author in PubMed Google Scholar
Mi-Xiao Hou
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Xing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Juan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sha-Sha Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Xing Liu .

Editor information

Editors and Affiliations

Tongji University, Shanghai, China
De-Shuang Huang
University of Ulsan, Ulsan, Korea (Republic of)
Kang-Hyun Jo
Wuhan University of Science and Technology, Wuhan City, China
Xiao-Long Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, SS., Hou, MX., Liu, JX., Wang, J., Yuan, SS. (2018). Identifying Characteristic Genes and Clustering via an L_p-Norm Robust Feature Selection Method for Integrated Data. In: Huang, DS., Jo, KH., Zhang, XL. (eds) Intelligent Computing Theories and Application. ICIC 2018. Lecture Notes in Computer Science(), vol 10955. Springer, Cham. https://doi.org/10.1007/978-3-319-95933-7_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-95933-7_51
Published: 06 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95932-0
Online ISBN: 978-3-319-95933-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics