Skip to main content

Variable Selection and Outlier Detection in Regularized Survival Models: Application to Melanoma Gene Expression Data

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2018)

Abstract

The importance of gene expression data analysis for oncological diagnosis and treatment has become widely accepted in recent years. One of the main associated challenges is the development of mathematical and statistical methods for data analysis to improve prognosis and guide treatment decisions. One of the difficulties that researchers face when dealing with gene expression datasets concerns their high-dimensionality. In this context, the goal of this work is to reduce the dimensionality of gene expression data using regularization techniques such as Lasso and Elastic net, complemented with DegreeCox, a network-based regularization method for survival analysis recently proposed. Also identification of long or short-term survivors (outliers) may lead to the detection of new prognostic factors, and the Rank Product test is used to identify those observations. An example based on the The Cancer Genome Atlas (TCGA) Melanoma dataset is presented, where the covariates are patients’ gene expression. The application of data reduction techniques to the Melanoma dataset enabled the selection of relevant genes over a range of parameters evaluated, with 5 in common between elastic net regularization and DegreeCox for one of the two models further evaluated. Moreover, a long term survivor was detected as outlier by the Rank Product test, being systematically highly ranked for the martingale residuals of the models evaluated.

The authors thank the European Union Horizon 2020 under grant agreement No. 633974 (SOUND project) and the Portuguese Foundation for Science & Technology (FCT) under projects UID/CEC/50021/2013, UID/EMS/50022/2013, PTDC/EMS-SIS/0642/2014, IF/00653/2012, SFRH/BD/97415/2013.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Braun-Falco, O., Plewig, G., Wolff, H.H., Burgdorf, W.H.C.: Melanocytic lesions. Dermatology. Springer, Berlin (2000). https://doi.org/10.1007/978-3-642-97931-6

    Chapter  Google Scholar 

  2. Breitling, R., Armengaud, P., Herzykr, P.: Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 573, 83–92 (2004)

    Article  Google Scholar 

  3. Breslow, N.: Discussion on professor Cox’s paper. J. Roy. Stat. Soc.: Ser. B 34, 216–217 (1972)

    MathSciNet  Google Scholar 

  4. Caldas, J., Vinga, S.: Global meta-analysis of transcriptomics studies. PLoS One 9(2) (2014). https://doi.org/10.1371/journal.pone.0089318

    Article  Google Scholar 

  5. Carrasquinha, E., Veríssimo, A., Lopes, M., Vinga, S.: Identification of influential observations in high-dimensional cancer survival data through the rank product test. BioData Min. 11(1) (2018). https://doi.org/10.1186/s13040-018-0162-z

  6. Cox, D.R.: Regression models and life-tables. J. Roy. Stat. Soc.: Ser. B (Methodol.) 34(2), 187–220 (1972). http://www.jstor.org/stable/2985181

    MathSciNet  MATH  Google Scholar 

  7. Heskes, T., Eisinga, R., Breitling, R.: A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments. BMC Bioinformatics 15, 367 (2014). https://doi.org/10.1186/s12859-014-0367-1

    Article  Google Scholar 

  8. Lopes, M., Veríssimo, A., Carrasquinha, E., Casimiro, S., Beerenwinkel, N., Vinga, S.: Ensemble outlier detection and gene selection in triple-negative breast cancer data. BMC Bioinformatics (2018). https://doi.org/10.1186/s12859-018-2149-7

  9. Nardi, A., Schemper, M.: New residuals for Cox regression and their application to outlier screening. Biometrics 55(2), 523–529 (1999). http://www.jstor.org/stable/2533801

    Article  Google Scholar 

  10. Peto, R., Peto, J.: Asymptotically efficient rank invariant test procedures. J. Roy. Stat. Soc.: Ser. A (Gen.) 135(2), 185–207 (1972). http://www.jstor.org/stable/2344317

    Article  Google Scholar 

  11. Storey, J.D.: A direct approach to false discovery rates. J. Roy. Stat. Soc. B 13(2), 216–225 (2002)

    MATH  Google Scholar 

  12. Therneau, T., Grambsch, P.M., Fleming, T.R.: Martingale-based residuals for survival models. Biometrika 77(1), 147–160 (1990). http://www.jstor.org/stable/2336057

    Article  MathSciNet  Google Scholar 

  13. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc.: Ser. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  14. Veríssimo, A., Oliveira, A.L., Sagot, M.F., Vinga, S.: DegreeCox - a network-based regularization method for survival analysis. BMC Bioinformatics 17(16), 449 (2016). https://doi.org/10.1186/s12859-016-1310-4

    Article  Google Scholar 

  15. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc.: Ser. B 67(2), 301–320 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eunice Carrasquinha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carrasquinha, E., Veríssimo, A., Lopes, M.B., Vinga, S. (2019). Variable Selection and Outlier Detection in Regularized Survival Models: Application to Melanoma Gene Expression Data. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2018. Lecture Notes in Computer Science(), vol 11331. Springer, Cham. https://doi.org/10.1007/978-3-030-13709-0_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-13709-0_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-13708-3

  • Online ISBN: 978-3-030-13709-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics