Abstract
In bioinformatics, feature selection is a good method for dimensionality reduction and has been widely used. However, the model of traditional feature selection method: Joint Embedding Learning and Sparse Regression (JELSR), whose the error term is in the form of a square term, which leads to the algorithm becoming extremely sensitive to noise and outliers and degrading the performance of the algorithm. Considering the above problem, we propose a new robust feature selection model by adding an Lp-norm constraint on error term, and name it as RJELSR, which improves the robustness of the algorithm. And we give an efficacious optimization strategy based on the augmented Lagrange multiplier method to get the optimal results. In the experimental section, we first preprocess different cancer data to obtain the integrated data, and then apply it to our algorithm for feature selection and sample clustering. Experiments on integrated data demonstrate that the performance of our method is superior to other compared methods and the selected characteristic genes are more biologically meaningful.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, W., Zheng, R., Baade, P.D., Zhang, S., Zeng, H., Bray, F., Jemal, A., Yu, X.Q., He, J.: Cancer statistics in China, 2015. CA Cancer J. Clin. 66(2), 115 (2016)
Reis-Filho, J.S.: Next-generation sequencing. J. Biomed. Biotechnol. 11(S3), S12 (2009)
D’Addabbo, A., et al.: SVD based feature selection and sample classification of proteomic data. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5179, pp. 556–563. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85567-5_69
Zheng, C.H., Yang, W., Chong, Y.W., Xia, J.F.: Identification of mutated driver pathways in cancer using a multi-objective optimization model. Comput. Biol. Med. 72, 22–29 (2016)
Liu, J.X., Xu, Y., Zheng, C.H., Kong, H., Lai, Z.H.: RPCA-based tumor classification using gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(4), 964–970 (2015)
Krzanowski, W.J.: Selection of variables to preserve multivariate data structure, using principal components. J. R. Stat. Soc. 36(1), 22–33 (1987)
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548 (2011)
Belkin, M., Niyogi, P.: Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2006)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: International Conference on Neural Information Processing Systems, pp. 507–514 (2006)
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning, pp. 1151–1157 (2007)
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)
Zhao, Z., Wang, L., Liu, H.: Efficient spectral feature selection with minimum redundancy. In: Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, pp. 11–15, July 2011
Hou, C., Nie, F., Li, X., Yi, D., Wu, Y.: Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans. Cybern. 44(6), 793 (2014)
Nie, F., Huang, H., Ding, C.: Low-rank matrix recovery via efficient schatten p-norm minimization. In: Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 655–661 (2012)
Nie, F., Wang, H., Huang, H., Ding, C.: Joint schatten p-norm and ℓp-norm robust matrix completion for missing value recovery. Knowl. Inf. Syst. 42(3), 525–544 (2015)
Chen, M., Lin, Z., Ma, Y., Wu, L.: The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Eprint Arxiv, vol. 9 (2010)
Liu, J., Liu, J.X., Gao, Y.L., Kong, X.Z., Wang, X.S., Wang, D.: A p-norm robust feature extraction method for identifying differentially expressed genes. PLoS ONE 10(7), e0133124 (2015)
Shang, R., Wang, W., Stolkin, R., Jiao, L.: Non-negative spectral learning and sparse regression-based dual-graph regularized feature selection. IEEE Trans. Cybern. PP(99), 1–14 (2017)
Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60(11), 5810–5819 (2012)
Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint ℓ2,1-norms minimization. In: Advances in Neural Information Processing Systems, pp. 1813–1821 (2010)
Gold, P., Freedman, S.O.: Specific carcinoembryonic antigens of the human digestive system. J. Exp. Med. 122(3), 467–481 (1965)
Gebauer, F., Wicklein, D., Horst, J., Sundermann, P., Maar, H., Streichert, T., Tachezy, M., Izbicki, J.R., Bockhorn, M., Schumacher, U.: Carcinoembryonic antigen-related cell adhesion molecules (CEACAM) 1, 5 and 6 as biomarkers in pancreatic cancer. PLoS ONE 9(11), e113023 (2014)
Blumenthal, R.D., Leon, E., Hansen, H.J., Goldenberg, D.M.: Expression patterns of CEACAM5 and CEACAM6 in primary and metastatic cancers. BMC Cancer 7(1), 2 (2007)
Choudhury, A., Moniaux, N., Winpenny, J.P., Hollingsworth, M.A., Aubert, J.P., Batra, S.K.: Human MUC4 mucin cDNA and its variants in pancreatic carcinoma. J. Biochem. 128(2), 233–243 (2000)
Lópezferrer, A., Alameda, F., Barranco, C., Garrido, M., De, B.C.: MUC4 expression is increased in dysplastic cervical disorders. Hum. Pathol. 32(11), 1197–1202 (2001)
Huang, J., Nie, F., Huang, H.: A new simplex sparse learning model to measure data similarity for clustering. In: International Conference on Artificial Intelligence, pp. 3569–3575 (2015)
Acknowledgement
This work was supported in part by the NSFC under grant Nos. 61572284, 61502272, and 61701279.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Wu, SS., Hou, MX., Liu, JX., Wang, J., Yuan, SS. (2018). Identifying Characteristic Genes and Clustering via an Lp-Norm Robust Feature Selection Method for Integrated Data. In: Huang, DS., Jo, KH., Zhang, XL. (eds) Intelligent Computing Theories and Application. ICIC 2018. Lecture Notes in Computer Science(), vol 10955. Springer, Cham. https://doi.org/10.1007/978-3-319-95933-7_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-95933-7_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-95932-0
Online ISBN: 978-3-319-95933-7
eBook Packages: Computer ScienceComputer Science (R0)