Abstract
In this paper, we introduce the holdout sampler to find the defective pathways in high underdetermined phenotype prediction problems. This sampling algorithm is inspired by the bootstrapping procedure used in regression analysis to established confidence bounds. We show that working with partial information (data bags) serves to sample the linear uncertainty region in a simple regression problem, mainly along the axis of greatest uncertainty that corresponds to the smallest singular value of the system matrix. This procedure applied to a phenotype prediction problem, considered as a generalized prediction problem between the set of genetic signatures and the set of classes in which the phenotype is divided, serves to unravel the ensemble of altered pathways in the transcriptome that are involved in the disease development. The algorithm looks for the minimum-scale genetic signature in each random holdout and the likelihood (predictive accuracy) is established using the validation dataset via a nearest-neighbor classifier. The posterior analysis serves to identify the header genes that most-frequently appear in the different hold-outs and are therefore robust to a partial lack of samples. These genes are used to establish the genetic pathways and the biological processes involved in the disease progression. This algorithm is much faster, robust and simpler than Bayesian Networks. We show its application to a microarray dataset concerning a type of breast cancers with poor prognoses (TNBC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
de Andrés Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Design of biomedical robots for phenotype prediction problems. J. Comput. Biol. 23(8), 678–92 (2016)
Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton (1993). ISBN 0-412-04231-2
Fernández-Martínez, J.L., Fernández-Muñiz, M.Z., Tompkins, M.J.: On the topography of the cost functional in linear and nonlinear inverse problems. Geophysics 77(1), W1–W15 (2012). https://doi.org/10.1190/geo2011-0341.1
Fernández-Martínez, J.L., Pallero, J.L.G., Fernández-Muñiz, Z., Pedruelo-González, L.M.: From Bayes to Tarantola: new insights to understand uncertainty in inverse problems. J. Appl. Geophys. 98, 62–72 (2013)
de Andrés-Galiana, E.J., Fernández-Martínez, J.L., Sonis, S.: Sensitivity analysis of gene ranking methods in phenotype prediction. J. Biomed. Inform. 64, 255–264 (2016)
Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Jiang, X., Barmada, M.M., Visweswaran, S.: Identifying genetic interactions in genome-wide data using Bayesian networks. Genet. Epidemiol. 34(6), 575–581 (2010)
Jézéquel, P., Loussouarn, D., Guérin-Charbonnel, C., Campion, L., et al.: Gene-expression molecular subtyping of triple-negative breast cancer tumours: importance of immune response. Breast Cancer Res. 20(17), 43 (2015)
Saligan, L.N., Fernández-Martínez, J.L., de Andrés Galiana, E.J., Sonis, S.: Supervised classification by filter methods and recursive feature elimination predicts risk of radiotherapy-related fatigue in patients with prostate cancer. Cancer Inform. 13(141–152), 2014 (2014)
Fernández-Martínez, J.L., de Andrés-Galiana, E.J., Sonis, S.: Genomic data integration in chronic lymphocytic leukemia. J. Gene Med. 19, 1–2 (2017)
Stelzer, G., Inger, A., Olender, T., Iny-Stein, T., Dalah, I., Harel, A., et al.: GeneDecks: paralog hunting and gene-set distillation with GeneCards annotation. OMICS 13(6), 477 (2009)
Jeon, M., Han, J., Nam, S.J., Lee, J.E., Kim, S.: STC-1 expression is upregulated through an Akt/NF-κB-dependent pathway in triple-negative breast cancer cells. Oncol. Rep. 36(3), 1717–1722 (2016). Epub 25 July 2016
Han, J., Jeon, M., Shin, I., Kim, S.: Elevated STC-1 augments the invasiveness of triple-negative breast cancer cells through activation of the JNK/c-Jun signaling pathway. Oncol. Rep. 36(3), 1764–71 (2016). Epub 26 July 2016
Gong, X., Wei, W., Chen, L., Xia, Z., Yu, C.: Comprehensive analysis of long non-coding RNA expression profiles in hepatitis B virus-related hepatocellular carcinoma. Oncotarget 7(27), 42422–42430 (2016). http://doi.org/10.18632/oncotarget.9880
Huang, X., Jan, L.Y.: Targeting potassium channels in cancer. J. Cell Biol. 206(2), 151–162 (2016). https://doi.org/10.1083/jcb.201404136
Lansu, K., Gentile, S.: Potassium channel activation inhibits proliferation of breast cancer cells by activating a senescence program. Cell Death Dis. 4, e652 (2013). https://doi.org/10.1038/cddis.2013.174
Mao, G., Jin, H., Wu, L.: DDX23-Linc00630-HDAC1 axis activates the Notch pathway to promote metastasis. Oncotarget. 8(24), 38937–38949 (2017). https://doi.org/10.18632/oncotarget.17156
Cernea, A., Fernández-Martínez, J.L., de Andrés-Galiana, E.J., Fernández-Ovies, F.J., Fernández-Muñiz, Z., Álvarez-Machancoses, O., Saligan, L., Sonis, S.: Sampling defective pathways in phenotype prediction problems via the Fisher’s ratio sampler. In: IWBBIO 2018 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Fernández-Martínez, J.L. et al. (2018). Sampling Defective Pathways in Phenotype Prediction Problems via the Holdout Sampler. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science(), vol 10814. Springer, Cham. https://doi.org/10.1007/978-3-319-78759-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-78759-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78758-9
Online ISBN: 978-3-319-78759-6
eBook Packages: Computer ScienceComputer Science (R0)