Skip to main content

Identifying Characteristic Genes and Clustering via an Lp-Norm Robust Feature Selection Method for Integrated Data

  • Conference paper
  • First Online:
Intelligent Computing Theories and Application (ICIC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10955))

Included in the following conference series:

Abstract

In bioinformatics, feature selection is a good method for dimensionality reduction and has been widely used. However, the model of traditional feature selection method: Joint Embedding Learning and Sparse Regression (JELSR), whose the error term is in the form of a square term, which leads to the algorithm becoming extremely sensitive to noise and outliers and degrading the performance of the algorithm. Considering the above problem, we propose a new robust feature selection model by adding an Lp-norm constraint on error term, and name it as RJELSR, which improves the robustness of the algorithm. And we give an efficacious optimization strategy based on the augmented Lagrange multiplier method to get the optimal results. In the experimental section, we first preprocess different cancer data to obtain the integrated data, and then apply it to our algorithm for feature selection and sample clustering. Experiments on integrated data demonstrate that the performance of our method is superior to other compared methods and the selected characteristic genes are more biologically meaningful.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, W., Zheng, R., Baade, P.D., Zhang, S., Zeng, H., Bray, F., Jemal, A., Yu, X.Q., He, J.: Cancer statistics in China, 2015. CA Cancer J. Clin. 66(2), 115 (2016)

    Article  Google Scholar 

  2. Reis-Filho, J.S.: Next-generation sequencing. J. Biomed. Biotechnol. 11(S3), S12 (2009)

    Google Scholar 

  3. D’Addabbo, A., et al.: SVD based feature selection and sample classification of proteomic data. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008. LNCS (LNAI), vol. 5179, pp. 556–563. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85567-5_69

    Chapter  Google Scholar 

  4. Zheng, C.H., Yang, W., Chong, Y.W., Xia, J.F.: Identification of mutated driver pathways in cancer using a multi-objective optimization model. Comput. Biol. Med. 72, 22–29 (2016)

    Article  Google Scholar 

  5. Liu, J.X., Xu, Y., Zheng, C.H., Kong, H., Lai, Z.H.: RPCA-based tumor classification using gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 12(4), 964–970 (2015)

    Article  Google Scholar 

  6. Krzanowski, W.J.: Selection of variables to preserve multivariate data structure, using principal components. J. R. Stat. Soc. 36(1), 22–33 (1987)

    Google Scholar 

  7. Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548 (2011)

    Article  Google Scholar 

  8. Belkin, M., Niyogi, P.: Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2006)

    Article  Google Scholar 

  9. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: International Conference on Neural Information Processing Systems, pp. 507–514 (2006)

    Google Scholar 

  10. Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the Twenty-Fourth International Conference on Machine Learning, pp. 1151–1157 (2007)

    Google Scholar 

  11. Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342 (2010)

    Google Scholar 

  12. Zhao, Z., Wang, L., Liu, H.: Efficient spectral feature selection with minimum redundancy. In: Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, pp. 11–15, July 2011

    Google Scholar 

  13. Hou, C., Nie, F., Li, X., Yi, D., Wu, Y.: Joint embedding learning and sparse regression: a framework for unsupervised feature selection. IEEE Trans. Cybern. 44(6), 793 (2014)

    Article  Google Scholar 

  14. Nie, F., Huang, H., Ding, C.: Low-rank matrix recovery via efficient schatten p-norm minimization. In: Twenty-Sixth AAAI Conference on Artificial Intelligence, pp. 655–661 (2012)

    Google Scholar 

  15. Nie, F., Wang, H., Huang, H., Ding, C.: Joint schatten p-norm and ℓp-norm robust matrix completion for missing value recovery. Knowl. Inf. Syst. 42(3), 525–544 (2015)

    Article  Google Scholar 

  16. Chen, M., Lin, Z., Ma, Y., Wu, L.: The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. Eprint Arxiv, vol. 9 (2010)

    Google Scholar 

  17. Liu, J., Liu, J.X., Gao, Y.L., Kong, X.Z., Wang, X.S., Wang, D.: A p-norm robust feature extraction method for identifying differentially expressed genes. PLoS ONE 10(7), e0133124 (2015)

    Article  Google Scholar 

  18. Shang, R., Wang, W., Stolkin, R., Jiao, L.: Non-negative spectral learning and sparse regression-based dual-graph regularized feature selection. IEEE Trans. Cybern. PP(99), 1–14 (2017)

    Article  Google Scholar 

  19. Chartrand, R.: Nonconvex splitting for regularized low-rank + sparse decomposition. IEEE Trans. Signal Process. 60(11), 5810–5819 (2012)

    Article  MathSciNet  Google Scholar 

  20. Nie, F., Huang, H., Cai, X., Ding, C.H.: Efficient and robust feature selection via joint ℓ2,1-norms minimization. In: Advances in Neural Information Processing Systems, pp. 1813–1821 (2010)

    Google Scholar 

  21. Gold, P., Freedman, S.O.: Specific carcinoembryonic antigens of the human digestive system. J. Exp. Med. 122(3), 467–481 (1965)

    Article  Google Scholar 

  22. Gebauer, F., Wicklein, D., Horst, J., Sundermann, P., Maar, H., Streichert, T., Tachezy, M., Izbicki, J.R., Bockhorn, M., Schumacher, U.: Carcinoembryonic antigen-related cell adhesion molecules (CEACAM) 1, 5 and 6 as biomarkers in pancreatic cancer. PLoS ONE 9(11), e113023 (2014)

    Article  Google Scholar 

  23. Blumenthal, R.D., Leon, E., Hansen, H.J., Goldenberg, D.M.: Expression patterns of CEACAM5 and CEACAM6 in primary and metastatic cancers. BMC Cancer 7(1), 2 (2007)

    Article  Google Scholar 

  24. Choudhury, A., Moniaux, N., Winpenny, J.P., Hollingsworth, M.A., Aubert, J.P., Batra, S.K.: Human MUC4 mucin cDNA and its variants in pancreatic carcinoma. J. Biochem. 128(2), 233–243 (2000)

    Article  Google Scholar 

  25. Lópezferrer, A., Alameda, F., Barranco, C., Garrido, M., De, B.C.: MUC4 expression is increased in dysplastic cervical disorders. Hum. Pathol. 32(11), 1197–1202 (2001)

    Article  Google Scholar 

  26. Huang, J., Nie, F., Huang, H.: A new simplex sparse learning model to measure data similarity for clustering. In: International Conference on Artificial Intelligence, pp. 3569–3575 (2015)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the NSFC under grant Nos. 61572284, 61502272, and 61701279.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin-Xing Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, SS., Hou, MX., Liu, JX., Wang, J., Yuan, SS. (2018). Identifying Characteristic Genes and Clustering via an Lp-Norm Robust Feature Selection Method for Integrated Data. In: Huang, DS., Jo, KH., Zhang, XL. (eds) Intelligent Computing Theories and Application. ICIC 2018. Lecture Notes in Computer Science(), vol 10955. Springer, Cham. https://doi.org/10.1007/978-3-319-95933-7_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-95933-7_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-95932-0

  • Online ISBN: 978-3-319-95933-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics