Skip to main content

Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology

  • Conference paper
  • First Online:
  • 312 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11826))

Abstract

Machine learning (ML) methods are still rarely used for gene expression/mutation-based prediction of individual tumor responses on anticancer chemotherapy due to relatively rare clinical case histories supplemented with high-throughput molecular data. This leads to high vulnerability of most ML methods are to overtraining. Recently, we proposed a novel hybrid global-local approach to ML termed FLOating Window Projective Separator (FloWPS) that avoids extrapolation in the feature space and may improve robustness of classifiers even for datasets with limited number of preceding cases. FloWPS has been validated for the support vector machines (SVM) method, where if significantly improved the quality of classifiers. The core property of FloWPS is data trimming, i.e. sample-specific removal of features. The irrelevant features in a sample that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. In addition, for each point of a validation dataset, only the proximal points of the training dataset are taken into account. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. Here, we applied this approach to seven popular ML methods, including SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). We performed computational experiments for 21 high throughput clinically annotated gene expression datasets totally including 1778 cancer patients who either responded or not on chemotherapy treatments. The biggest dataset had samples for 235, whereas the smallest for 41 individual cases. For global ML methods, such as SVM, RF, BNB, ADA and MLP, FloWPS essentially improved the classifier quality. Namely, the area under the receiver-operator curve (ROC AUC) for the responder vs non-responder classifier, increased from typical range 0.65–0.85 to 0.80–0.95, respectively. On the other hand, FloWPS was shown useless for purely local ML techniques such as kNN method or RR. However, both these local methods exhibited low sensitivity or specificity in cases when false positive or false negative errors, respectively, should be avoided. According to sensitivity-specificity criterion, for all the datasets tested, the best performance in combination with FloWPS data trimming was shown for the binomial naïve Bayesian method, which can be valuable for further development of predictors in personalized oncology.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Buzdin, A., et al.: RNA sequencing for research and diagnostics in clinical oncology. Semin. Cancer Biol. (2019). https://doi.org/10.1016/j.semcancer.2019.07.010

    Article  Google Scholar 

  2. Zhukov, N.V., Tjulandin, S.A.: Targeted therapy in the treatment of solid tumors: practice contradicts theory. Biochem. Biokhim. 73, 605–618 (2008)

    Article  Google Scholar 

  3. Borisov, N., Buzdin, A.: New paradigm of machine learning (ML) in personalized oncology: data trimming for squeezing more biomarkers from clinical datasets. Front. Oncol. 9, 658 (2019). https://doi.org/10.3389/fonc.2019.00658

    Article  Google Scholar 

  4. Artemov, A., et al.: A method for predicting target drug efficiency in cancer based on the analysis of signaling pathway activation. Oncotarget 6, 29347–29356 (2015). https://doi.org/10.18632/oncotarget.5119

  5. Shepelin, D., et al.: Molecular pathway activation features linked with transition from normal skin to primary and metastatic melanomas in human. Oncotarget 7, 656–670 (2016). https://doi.org/10.18632/oncotarget.6394

  6. Zolotovskaia, M.A., et al.: Pathway based analysis of mutation data is efficient for scoring target cancer drugs. Front. Pharmacol. 10 (2019). https://doi.org/10.3389/fphar.2019.00001

  7. Turki, T., Wang, J.T.L.: Clinical intelligence: new machine learning techniques for predicting clinical drug response. Comput. Biol. Med. 107, 302–322 (2019). https://doi.org/10.1016/j.compbiomed.2018.12.017

    Article  Google Scholar 

  8. Turki, T., Wei, Z.: A link prediction approach to cancer drug sensitivity prediction. BMC Syst. Biol. 11 (2017). https://doi.org/10.1186/s12918-017-0463-8

  9. Turki, T., Wei, Z., Wang, J.T.L.: Transfer learning approaches to improve drug sensitivity prediction in multiple myeloma patients. IEEE Access 5, 7381–7393 (2017). https://doi.org/10.1109/ACCESS.2017.2696523

    Article  Google Scholar 

  10. Turki, T., Wei, Z., Wang, J.T.L.: A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction. J. Bioinform. Comput. Biol. 16, 1840014 (2018). https://doi.org/10.1142/S0219720018400140

    Article  Google Scholar 

  11. Mulligan, G., et al.: Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. Blood 109, 3177–3188 (2007). https://doi.org/10.1182/blood-2006-09-044974

    Article  Google Scholar 

  12. Borisov, N., Tkachev, V., Muchnik, I., Buzdin, A.: Individual Drug Treatment Prediction in Oncology Based on Machine Learning Using Cell Culture Gene Expression Data (2017). https://doi.org/10.1145/3155077.3155078

  13. Borisov, N., Tkachev, V., Suntsova, M., Kovalchuk, O., Zhavoronkov, A., Muchnik, I., Buzdin, A.: A method of gene expression data transfer from cell lines to cancer patients for machine-learning prediction of drug efficiency. Cell Cycle 17, 486–491 (2018). https://doi.org/10.1080/15384101.2017.1417706

    Article  Google Scholar 

  14. Borisov, N., Tkachev, V., Buzdin, A., Muchnik, I.: Prediction of drug efficiency by transferring gene expression data from cell lines to cancer patients. In: Rozonoer, L., Mirkin, B., Muchnik, I. (eds.) Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. LNCS (LNAI), vol. 11100, pp. 201–212. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99492-5_9

    Chapter  Google Scholar 

  15. Tkachev, V., et al.: FLOating-window projective separator (FloWPS): a data trimming tool for support vector machines (SVM) to improve robustness of the classifier. Front. Genet. 9 (2019). https://doi.org/10.3389/fgene.2018.00717

  16. Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992). https://doi.org/10.1080/00031305.1992.10475879

    Article  MathSciNet  Google Scholar 

  17. Toloşi, L., Lengauer, T.: Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27, 1986–1994 (2011). https://doi.org/10.1093/bioinformatics/btr300

    Article  Google Scholar 

  18. Tikhonov, A.N., Arsenin, V.I.: Solutions of Ill-Posed Problems. Winston ; Distributed solely by Halsted Press, Washington (1977)

    Google Scholar 

  19. Cho, H.-J., Lee, S., Ji, Y.G., Lee, D.H.: Association of specific gene mutations derived from machine learning with survival in lung adenocarcinoma. PLoS ONE 13, e0207204 (2018). https://doi.org/10.1371/journal.pone.0207204

    Article  Google Scholar 

  20. Davoudi, A., Ozrazgat-Baslanti, T., Ebadi, A., Bursian, A.C., Bihorac, A., Rashidi, P.: Delirium prediction using machine learning models on predictive electronic health records data. In: 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 568–573. IEEE, Washington, DC (2017). https://doi.org/10.1109/BIBE.2017.00014

  21. Turki, T., Wei, Z.: Learning approaches to improve prediction of drug sensitivity in breast cancer patients. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 3314–3320. IEEE, Orlando, FL, USA (2016). https://doi.org/10.1109/EMBC.2016.7591437

  22. Zhang, L., et al.: Applications of machine learning methods in drug toxicity prediction. Curr. Top. Med. Chem. 18 (2018). https://doi.org/10.2174/1568026618666180727152557

    Article  Google Scholar 

  23. Wang, Z., et al.: In silico prediction of blood-brain barrier permeability of compounds by machine learning and resampling methods. Chem. Med. Chem. 13, 2189–2201 (2018). https://doi.org/10.1002/cmdc.201800533

    Article  Google Scholar 

  24. Yosipof, A., Guedes, R.C., García-Sosa, A.T.: Data mining and machine learning models for predicting drug likeness and their disease or organ category. Front. Chem. 6 (2018). https://doi.org/10.3389/fchem.2018.00162

  25. Minsky, M.L., Papert, S.A.: Perceptrons - Expanded Edition: An Introduction to Computational Geometry. MIT press, Boston (1987)

    MATH  Google Scholar 

  26. Prados, J., Kalousis, A., Sanchez, J.-C., Allard, L., Carrette, O., Hilario, M.: Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents. Proteomics 4, 2320–2332 (2004). https://doi.org/10.1002/pmic.200400857

    Article  Google Scholar 

  27. Robin, X., Turck, N., Hainard, A., Lisacek, F., Sanchez, J.-C., Müller, M.: Bioinformatics for protein biomarker panel classification: what is needed to bring biomarker panels into in vitro diagnostics? Expert Rev. Proteomics 6, 675–689 (2009). https://doi.org/10.1586/epr.09.83

    Article  Google Scholar 

  28. Gent, D.H., Esker, P.D., Kriss, A.B.: Statistical power in plant pathology research. Phytopathology 108, 15–22 (2018). https://doi.org/10.1094/PHYTO-03-17-0098-LE

    Article  Google Scholar 

  29. Ioannidis, J.P.A., Hozo, I., Djulbegovic, B.: Optimal type I and type II error pairs when the available sample size is fixed. J. Clin. Epidemiol. 66, 903–910.e2 (2013). https://doi.org/10.1016/j.jclinepi.2013.03.002

    Article  Google Scholar 

  30. Wetterslev, J., Jakobsen, J.C., Gluud, C.: Trial sequential analysis in systematic reviews with meta-analysis. BMC Med. Res. Methodol. 17, 39 (2017). https://doi.org/10.1186/s12874-017-0315-7

    Article  Google Scholar 

  31. Kim, H.-Y.: Statistical notes for clinical researchers: Type I and type II errors in statistical decision. Restorative Dent. Endodontics 40, 249 (2015). https://doi.org/10.5395/rde.2015.40.3.249

    Article  Google Scholar 

  32. Lu, J., Qiu, Y., Deng, A.: A note on type S/M errors in hypothesis testing. Br. J. Math. Stat. Psychol. 72, 1–17 (2019). https://doi.org/10.1111/bmsp.12132

    Article  MATH  Google Scholar 

  33. Litière, S., Alonso, A., Molenberghs, G.: Type I and Type II error under random-effects misspecification in generalized linear mixed models. Biometrics 63, 1038–1044 (2007). https://doi.org/10.1111/j.1541-0420.2007.00782.x

    Article  MathSciNet  MATH  Google Scholar 

  34. Cummins, R.O., Hazinski, M.F.: Guidelines based on fear of type II (false-negative) errors: why we dropped the pulse check for lay rescuers. Circulation 102, I377–I379 (2000)

    Google Scholar 

  35. Rodriguez, P., Maestre, Z., Martinez-Madrid, M., Reynoldson, T.B.: Evaluating the type II error rate in a sediment toxicity classification using the reference condition approach. Aquat. Toxicol. 101, 207–213 (2011). https://doi.org/10.1016/j.aquatox.2010.09.020

    Article  Google Scholar 

  36. Hatzis, C., et al.: A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. JAMA 305, 1873–1881 (2011). https://doi.org/10.1001/jama.2011.593

    Article  Google Scholar 

  37. Itoh, M., et al.: Estrogen receptor (ER) mRNA expression and molecular subtype distribution in ER-negative/progesterone receptor-positive breast cancers. Breast Cancer Res. Treat. 143, 403–409 (2014). https://doi.org/10.1007/s10549-013-2763-z

    Article  Google Scholar 

  38. Horak, C.E., et al.: Biomarker analysis of neoadjuvant doxorubicin/cyclophosphamide followed by ixabepilone or Paclitaxel in early-stage breast cancer. Clin. Cancer Res. 19, 1587–1595 (2013). https://doi.org/10.1158/1078-0432.CCR-12-1359

    Article  Google Scholar 

  39. Korde, L.A., et al.: Gene expression pathway analysis to predict response to neoadjuvant docetaxel and capecitabine for breast cancer. Breast Cancer Res. Treat. 119, 685–699 (2010). https://doi.org/10.1007/s10549-009-0651-3

    Article  Google Scholar 

  40. Miller, W.R., Larionov, A.: Changes in expression of oestrogen regulated and proliferation genes with neoadjuvant treatment highlight heterogeneity of clinical resistance to the aromatase inhibitor, letrozole. Breast Cancer Res. 12, R52 (2010). https://doi.org/10.1186/bcr2611

    Article  Google Scholar 

  41. Miller, W.R., Larionov, A., Anderson, T.J., Evans, D.B., Dixon, J.M.: Sequential changes in gene expression profiles in breast cancers during treatment with the aromatase inhibitor, letrozole. Pharmacogenomics J. 12, 10–21 (2012). https://doi.org/10.1038/tpj.2010.67

    Article  Google Scholar 

  42. Popovici, V., et al.: Effect of training-sample size and classification difficulty on the accuracy of genomic predictors. Breast Cancer Res. 12, R5 (2010). https://doi.org/10.1186/bcr2468

    Article  Google Scholar 

  43. Iwamoto, T., et al.: Gene pathways associated with prognosis and chemotherapy sensitivity in molecular subtypes of breast cancer. J. Nat. Cancer Inst. 103, 264–272 (2011). https://doi.org/10.1093/jnci/djq524

    Article  Google Scholar 

  44. Miyake, T., et al.: GSTP1 expression predicts poor pathological complete response to neoadjuvant chemotherapy in ER-negative breast cancer. Cancer Sci. 103, 913–920 (2012). https://doi.org/10.1111/j.1349-7006.2012.02231.x

    Article  Google Scholar 

  45. Liu, J.C., et al.: Seventeen-gene signature from enriched Her2/Neu mammary tumor-initiating cells predicts clinical outcome for human HER2+: ERα- breast cancer. Proc. Natl. Acad. Sci. U.S.A. 109, 5832–5837 (2012). https://doi.org/10.1073/pnas.1201105109

    Article  Google Scholar 

  46. Shen, K., et al.: Cell line derived multi-gene predictor of pathologic response to neoadjuvant chemotherapy in breast cancer: a validation study on US Oncology 02-103 clinical trial. BMC Med. Genomics 5, 51 (2012). https://doi.org/10.1186/1755-8794-5-51

    Article  Google Scholar 

  47. Turnbull, A.K., et al.: Accurate prediction and validation of response to endocrine therapy in breast cancer. J. Clin. Oncol. 33, 2270–2278 (2015). https://doi.org/10.1200/JCO.2014.57.8963

    Article  Google Scholar 

  48. Chauhan, D., et al.: A small molecule inhibitor of ubiquitin-specific protease-7 induces apoptosis in multiple myeloma cells and overcomes bortezomib resistance. Cancer Cell 22, 345–358 (2012). https://doi.org/10.1016/j.ccr.2012.08.007

    Article  Google Scholar 

  49. Terragna, C., et al.: The genetic and genomic background of multiple myeloma patients achieving complete response after induction therapy with bortezomib, thalidomide and dexamethasone (VTD). Oncotarget 7, 9666–9679 (2016). https://doi.org/10.18632/oncotarget.5718

  50. Amin, S.B., et al.: Gene expression profile alone is inadequate in predicting complete response in multiple myeloma. Leukemia 28, 2229–2234 (2014). https://doi.org/10.1038/leu.2014.140

    Article  Google Scholar 

  51. Raponi, M., et al.: Identification of molecular predictors of response in a study of tipifarnib treatment in relapsed and refractory acute myelogenous leukemia. Clin. Cancer Res. 13, 2254–2260 (2007). https://doi.org/10.1158/1078-0432.CCR-06-2609

    Article  Google Scholar 

  52. Goldman, M., et al.: The UCSC cancer genomics browser: update 2015. Nucleic Acids Res. 43, D812–D817 (2015). https://doi.org/10.1093/nar/gku1073

    Article  Google Scholar 

  53. Tricoli, J.V., et al.: Biologic and clinical characteristics of adolescent and young adult cancers: acute lymphoblastic leukemia, colorectal cancer, breast cancer, melanoma, and sarcoma: biology of AYA cancers. Cancer 122, 1017–1028 (2016). https://doi.org/10.1002/cncr.29871

    Article  Google Scholar 

  54. Tomczak, K., Czerwińska, P., Wiznerowicz, M.: The cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. (Poznan, Poland) 19, A68–A77 (2015). https://doi.org/10.5114/wo.2014.47136

    Article  Google Scholar 

Download references

Acknowledgements

The study was supported by Russian Foundation for Basic Research Grant 19-29-01108.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicolas Borisov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tkachev, V., Buzdin, A., Borisov, N. (2019). Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology. In: Bebis, G., Benos, T., Chen, K., Jahn, K., Lima, E. (eds) Mathematical and Computational Oncology. ISMCO 2019. Lecture Notes in Computer Science(), vol 11826. Springer, Cham. https://doi.org/10.1007/978-3-030-35210-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35210-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35209-7

  • Online ISBN: 978-3-030-35210-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics