Abstract
The importance of gene expression data analysis for oncological diagnosis and treatment has become widely accepted in recent years. One of the main associated challenges is the development of mathematical and statistical methods for data analysis to improve prognosis and guide treatment decisions. One of the difficulties that researchers face when dealing with gene expression datasets concerns their high-dimensionality. In this context, the goal of this work is to reduce the dimensionality of gene expression data using regularization techniques such as Lasso and Elastic net, complemented with DegreeCox, a network-based regularization method for survival analysis recently proposed. Also identification of long or short-term survivors (outliers) may lead to the detection of new prognostic factors, and the Rank Product test is used to identify those observations. An example based on the The Cancer Genome Atlas (TCGA) Melanoma dataset is presented, where the covariates are patients’ gene expression. The application of data reduction techniques to the Melanoma dataset enabled the selection of relevant genes over a range of parameters evaluated, with 5 in common between elastic net regularization and DegreeCox for one of the two models further evaluated. Moreover, a long term survivor was detected as outlier by the Rank Product test, being systematically highly ranked for the martingale residuals of the models evaluated.
The authors thank the European Union Horizon 2020 under grant agreement No. 633974 (SOUND project) and the Portuguese Foundation for Science & Technology (FCT) under projects UID/CEC/50021/2013, UID/EMS/50022/2013, PTDC/EMS-SIS/0642/2014, IF/00653/2012, SFRH/BD/97415/2013.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Braun-Falco, O., Plewig, G., Wolff, H.H., Burgdorf, W.H.C.: Melanocytic lesions. Dermatology. Springer, Berlin (2000). https://doi.org/10.1007/978-3-642-97931-6
Breitling, R., Armengaud, P., Herzykr, P.: Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 573, 83–92 (2004)
Breslow, N.: Discussion on professor Cox’s paper. J. Roy. Stat. Soc.: Ser. B 34, 216–217 (1972)
Caldas, J., Vinga, S.: Global meta-analysis of transcriptomics studies. PLoS One 9(2) (2014). https://doi.org/10.1371/journal.pone.0089318
Carrasquinha, E., Veríssimo, A., Lopes, M., Vinga, S.: Identification of influential observations in high-dimensional cancer survival data through the rank product test. BioData Min. 11(1) (2018). https://doi.org/10.1186/s13040-018-0162-z
Cox, D.R.: Regression models and life-tables. J. Roy. Stat. Soc.: Ser. B (Methodol.) 34(2), 187–220 (1972). http://www.jstor.org/stable/2985181
Heskes, T., Eisinga, R., Breitling, R.: A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments. BMC Bioinformatics 15, 367 (2014). https://doi.org/10.1186/s12859-014-0367-1
Lopes, M., Veríssimo, A., Carrasquinha, E., Casimiro, S., Beerenwinkel, N., Vinga, S.: Ensemble outlier detection and gene selection in triple-negative breast cancer data. BMC Bioinformatics (2018). https://doi.org/10.1186/s12859-018-2149-7
Nardi, A., Schemper, M.: New residuals for Cox regression and their application to outlier screening. Biometrics 55(2), 523–529 (1999). http://www.jstor.org/stable/2533801
Peto, R., Peto, J.: Asymptotically efficient rank invariant test procedures. J. Roy. Stat. Soc.: Ser. A (Gen.) 135(2), 185–207 (1972). http://www.jstor.org/stable/2344317
Storey, J.D.: A direct approach to false discovery rates. J. Roy. Stat. Soc. B 13(2), 216–225 (2002)
Therneau, T., Grambsch, P.M., Fleming, T.R.: Martingale-based residuals for survival models. Biometrika 77(1), 147–160 (1990). http://www.jstor.org/stable/2336057
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc.: Ser. B 58(1), 267–288 (1996)
Veríssimo, A., Oliveira, A.L., Sagot, M.F., Vinga, S.: DegreeCox - a network-based regularization method for survival analysis. BMC Bioinformatics 17(16), 449 (2016). https://doi.org/10.1186/s12859-016-1310-4
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc.: Ser. B 67(2), 301–320 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Carrasquinha, E., Veríssimo, A., Lopes, M.B., Vinga, S. (2019). Variable Selection and Outlier Detection in Regularized Survival Models: Application to Melanoma Gene Expression Data. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2018. Lecture Notes in Computer Science(), vol 11331. Springer, Cham. https://doi.org/10.1007/978-3-030-13709-0_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-13709-0_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13708-3
Online ISBN: 978-3-030-13709-0
eBook Packages: Computer ScienceComputer Science (R0)