Skip to main content

Addition of Pathway-Based Information to Improve Predictions in Transcriptomics

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2019)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11466))

Abstract

The diagnosis and prognosis of cancer are among the more critical challenges that modern medicine confronts. In this sense, personalized medicine aims to use data from heterogeneous sources to estimate the evolution of the disease for each specific patient in order to fit the more appropriate treatments. In recent years, DNA sequencing data have boosted cancer prediction and treatment by supplying genetic information that has been used to design genetic signatures or biomarkers that led to a better classification of the different subtypes of cancer as well as to a better estimation of the evolution of the disease and the response to diverse treatments. Several machine learning models have been proposed in the literature for cancer prediction. However, the efficacy of these models can be seriously affected by the existing imbalance between the high dimensionality of the gene expression feature sets and the number of samples available, what is known as the curse of dimensionality. Although linear predictive models could give worse performance rates when compared to more sophisticated non-linear models, they have the main advantage of being interpretable. However, the use of domain-specific information has been proved useful to boost the performance of multivariate linear predictors in high dimensional settings. In this work, we design a set of linear predictive models that incorporate domain-specific information from genetic pathways for effective feature selection. By combining these linear model with other classical machine learning models, we get state-of-art performance rates in the prediction of vital status on a public cancer dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.genome.jp/kegg/.

  2. 2.

    https://cancergenome.nih.gov/.

References

  1. Aronson, S.J., Rehm, H.L.: Building the foundation for genomics in precision medicine. Nature 526(7573), 336–342 (2015)

    Article  Google Scholar 

  2. Kourou, K., Exarchos, T.P., Exarchos, K.P., Karamouzis, M.V., Fotiadis, D.I.: Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015)

    Article  Google Scholar 

  3. Bashiri, A., Ghazisaeedi, M., Safdari, R., Shahmoradi, L., Ehtesham, H.: Improving the prediction of survival in cancer patients by using machine learning techniques: experience of gene expression data: a narrative review. Iran. J. Public Health 46(2), 165–172 (2017)

    Google Scholar 

  4. Johnstone, I.M., Titterington, D.M.: Statistical challenges of high-dimensional data. Philos. Trans. A Math. Phys. Eng. Sci. 367(1906), 4237–4253 (2009)

    Article  MathSciNet  Google Scholar 

  5. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Article  Google Scholar 

  6. van’t Veer, L.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)

    Article  Google Scholar 

  7. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)

    Article  MathSciNet  Google Scholar 

  8. Urda, D., Jerez, J.M., Turias, I.J.: Data dimension and structure effects in predictive performance of deep neural networks. In: New Trends in Intelligent Software Methodologies, Tools and Techniques, pp. 361–372 (2018)

    Google Scholar 

  9. Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)

    Article  Google Scholar 

  10. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M., Tanabe, M.: KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44(D1), D457–D462 (2016)

    Article  Google Scholar 

  11. Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y., Morishima, K.: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45(D1), D353–D361 (2017)

    Article  Google Scholar 

  12. Tibshirani, R.: Regression shrinkage and selection via the lasso: a retrospective. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 58(1), 267–288 (1996)

    MATH  Google Scholar 

  13. Meier, L., Van De Geer, S., Bühlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70(1), 53–71 (2008)

    Article  MathSciNet  Google Scholar 

  14. Jacob, L., Obozinski, G., Vert, J.P.: Group lasso with overlap and graph lasso. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 433–440 (2009)

    Google Scholar 

  15. Zeng, Y., Breheny, P.: Overlapping group logistic regression with applications to genetic pathway selection. Cancer Inf. 15(1), 179–187 (2016)

    Google Scholar 

  16. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (2006)

    Article  Google Scholar 

  17. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  18. Rossi, F., Villa, N.: Support vector machine for functional data classification. Neurocomputing 69(7), 730–742 (2006)

    Article  Google Scholar 

  19. Bischl, B., et al.: mlr: machine learning in R. J. Mach. Learn. Res. 17(170), 1–5 (2016)

    MathSciNet  MATH  Google Scholar 

  20. Li, B., Dewey, C.N.: RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 12(1), 323 (2011)

    Article  Google Scholar 

  21. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, IJCAI 1995, vol. 2, pp. 1137–1143 (1995)

    Google Scholar 

  22. Bischl, B., Richter, J., Bossek, J., Horn, D., Thomas, J., Lang, M.: mlrMBO: A Modular Framework for Model-Based Optimization of Expensive Black-Box Functions (2017)

    Google Scholar 

  23. Dees, N.D., et al.: MuSiC: identifying mutational significance in cancer genomes. Genome Res. 8, 1589–98 (2012)

    Article  Google Scholar 

  24. Shimomura, A., et al.: Novel combination of serum microrna for detecting breast cancer in the early stage. Cancer Sci. 107(3), 326–334 (2016)

    Article  Google Scholar 

  25. Zhao, H., Shen, J., Medico, L., Wang, D., Ambrosone, C.B., Liu, S.: A pilot study of circulating miRNAs as potential biomarkers of early stage breast cancer. PLoS ONE 5(10), 1–12 (2010)

    Google Scholar 

  26. Chen, G.Q., Zhao, Z.W., Zhou, H.Y., Liu, Y.J., Yang, H.J.: Systematic analysis of microRNA involved in resistance of the MCF-7 human breast cancer cell to doxorubicin. Med. Oncol. 27(2), 406–415 (2010)

    Article  Google Scholar 

Download references

Acknowledgments

This work is part of the coordinated research projects TIN2014-58516-C2-1-R, TIN2014-58516-C2-2-R and TIN2017-88728-C2 from MINECO-SPAIN which include FEDER funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Urda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Urda, D., Veredas, F.J., Turias, I., Franco, L. (2019). Addition of Pathway-Based Information to Improve Predictions in Transcriptomics. In: Rojas, I., Valenzuela, O., Rojas, F., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2019. Lecture Notes in Computer Science(), vol 11466. Springer, Cham. https://doi.org/10.1007/978-3-030-17935-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17935-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17934-2

  • Online ISBN: 978-3-030-17935-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics