Identifying Characteristic Genes and Clustering via an Lp-Norm Robust Feature Selection Method for Integrated Data

  • Sha-Sha Wu
  • Mi-Xiao Hou
  • Jin-Xing LiuEmail author
  • Juan Wang
  • Sha-Sha Yuan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10955)


In bioinformatics, feature selection is a good method for dimensionality reduction and has been widely used. However, the model of traditional feature selection method: Joint Embedding Learning and Sparse Regression (JELSR), whose the error term is in the form of a square term, which leads to the algorithm becoming extremely sensitive to noise and outliers and degrading the performance of the algorithm. Considering the above problem, we propose a new robust feature selection model by adding an Lp-norm constraint on error term, and name it as RJELSR, which improves the robustness of the algorithm. And we give an efficacious optimization strategy based on the augmented Lagrange multiplier method to get the optimal results. In the experimental section, we first preprocess different cancer data to obtain the integrated data, and then apply it to our algorithm for feature selection and sample clustering. Experiments on integrated data demonstrate that the performance of our method is superior to other compared methods and the selected characteristic genes are more biologically meaningful.


Lp-norm constraint Integrated Data Feature selection Clustering 



This work was supported in part by the NSFC under grant Nos. 61572284, 61502272, and 61701279.


  1. 1.
    Chen, W., Zheng, R., Baade, P.D., Zhang, S., Zeng, H., Bray, F., Jemal, A., Yu, X.Q., He, J.: Cancer statistics in China, 2015. CA Cancer J. Clin. 66(2), 115 (2016)CrossRefGoogle Scholar
  2. 2.
    Reis-Filho, J.S.: Next-generation sequencing. J. Biomed. Biotechnol. 11(S3), S12 (2009)Google Scholar
  3. 3.
    D’Addabbo, A., et al.: SVD based feature selection and sample classification of proteomic data. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5179, pp. 556–563. Springer, Heidelberg (2008). Scholar
  4. 4.
    Zheng, C.H., Yang, W., Chong, Y.W., Xia, J.F.: Identification of mutated driver pathways in cancer using a multi-objective optimization model. Comput. Biol. Med. 72, 22–29 (2016)CrossRefGoogle Scholar
  5. 5.
    Liu, J.X., Xu, Y., Zheng, C.H., Kong, H., Lai, Z.H.: RPCA-based tumor classification using gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(4), 964–970 (2015)CrossRefGoogle Scholar
  6. 6.
    Krzanowski, W.J.: Selection of variables to preserve multivariate data structure, using principal components. J. R. Stat. Soc. 36(1), 22–33 (1987)Google Scholar
  7. 7.
    Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548 (2011)CrossRefGoogle Scholar
  8. 8.
    Belkin, M., Niyogi, P.: Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2006)CrossRefGoogle Scholar
  9. 9.
    He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: International Conference on Neural Information Processing Systems, pp. 507–514 (2006)Google Scholar
  10. 10.
    Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning, pp. 1151–1157 (2007)Google Scholar
  11. 11.
    Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)Google Scholar
  12. 12.
    Zhao, Z., Wang, L., Liu, H.: Efficient spectral feature selection with minimum redundancy. In: Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, pp. 11–15, July 2011Google Scholar
  13. 13.
    Hou, C., Nie, F., Li, X., Yi, D., Wu, Y.: Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans. Cybern. 44(6), 793 (2014)CrossRefGoogle Scholar
  14. 14.
    Nie, F., Huang, H., Ding, C.: Low-rank matrix recovery via efficient schatten p-norm minimization. In: Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 655–661 (2012)Google Scholar
  15. 15.
    Nie, F., Wang, H., Huang, H., Ding, C.: Joint schatten p-norm and ℓp-norm robust matrix completion for missing value recovery. Knowl. Inf. Syst. 42(3), 525–544 (2015)CrossRefGoogle Scholar
  16. 16.
    Chen, M., Lin, Z., Ma, Y., Wu, L.: The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Eprint Arxiv, vol. 9 (2010)Google Scholar
  17. 17.
    Liu, J., Liu, J.X., Gao, Y.L., Kong, X.Z., Wang, X.S., Wang, D.: A p-norm robust feature extraction method for identifying differentially expressed genes. PLoS ONE 10(7), e0133124 (2015)CrossRefGoogle Scholar
  18. 18.
    Shang, R., Wang, W., Stolkin, R., Jiao, L.: Non-negative spectral learning and sparse regression-based dual-graph regularized feature selection. IEEE Trans. Cybern. PP(99), 1–14 (2017)CrossRefGoogle Scholar
  19. 19.
    Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60(11), 5810–5819 (2012)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint ℓ2,1-norms minimization. In: Advances in Neural Information Processing Systems, pp. 1813–1821 (2010)Google Scholar
  21. 21.
    Gold, P., Freedman, S.O.: Specific carcinoembryonic antigens of the human digestive system. J. Exp. Med. 122(3), 467–481 (1965)CrossRefGoogle Scholar
  22. 22.
    Gebauer, F., Wicklein, D., Horst, J., Sundermann, P., Maar, H., Streichert, T., Tachezy, M., Izbicki, J.R., Bockhorn, M., Schumacher, U.: Carcinoembryonic antigen-related cell adhesion molecules (CEACAM) 1, 5 and 6 as biomarkers in pancreatic cancer. PLoS ONE 9(11), e113023 (2014)CrossRefGoogle Scholar
  23. 23.
    Blumenthal, R.D., Leon, E., Hansen, H.J., Goldenberg, D.M.: Expression patterns of CEACAM5 and CEACAM6 in primary and metastatic cancers. BMC Cancer 7(1), 2 (2007)CrossRefGoogle Scholar
  24. 24.
    Choudhury, A., Moniaux, N., Winpenny, J.P., Hollingsworth, M.A., Aubert, J.P., Batra, S.K.: Human MUC4 mucin cDNA and its variants in pancreatic carcinoma. J. Biochem. 128(2), 233–243 (2000)CrossRefGoogle Scholar
  25. 25.
    Lópezferrer, A., Alameda, F., Barranco, C., Garrido, M., De, B.C.: MUC4 expression is increased in dysplastic cervical disorders. Hum. Pathol. 32(11), 1197–1202 (2001)CrossRefGoogle Scholar
  26. 26.
    Huang, J., Nie, F., Huang, H.: A new simplex sparse learning model to measure data similarity for clustering. In: International Conference on Artificial Intelligence, pp. 3569–3575 (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Sha-Sha Wu
    • 1
  • Mi-Xiao Hou
    • 1
  • Jin-Xing Liu
    • 1
    Email author
  • Juan Wang
    • 1
  • Sha-Sha Yuan
    • 1
  1. 1.School of Information Science and EngineeringQufu Normal UniversityRizhaoChina

Personalised recommendations