Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology

Tkachev, Victor; Buzdin, Anton; Borisov, Nicolas

doi:10.1007/978-3-030-35210-3_5

Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology

Conference paper
First Online: 12 November 2019

312 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11826))

Abstract

Machine learning (ML) methods are still rarely used for gene expression/mutation-based prediction of individual tumor responses on anticancer chemotherapy due to relatively rare clinical case histories supplemented with high-throughput molecular data. This leads to high vulnerability of most ML methods are to overtraining. Recently, we proposed a novel hybrid global-local approach to ML termed FLOating Window Projective Separator (FloWPS) that avoids extrapolation in the feature space and may improve robustness of classifiers even for datasets with limited number of preceding cases. FloWPS has been validated for the support vector machines (SVM) method, where if significantly improved the quality of classifiers. The core property of FloWPS is data trimming, i.e. sample-specific removal of features. The irrelevant features in a sample that don’t have significant number of neighboring hits in the training dataset are removed from further analyses. In addition, for each point of a validation dataset, only the proximal points of the training dataset are taken into account. Thus, for every point of a validation dataset, the training dataset is adjusted to form a floating window. Here, we applied this approach to seven popular ML methods, including SVM, k nearest neighbors (kNN), random forest (RF), Tikhonov (ridge) regression (RR), binomial naïve Bayes (BNB), adaptive boosting (ADA) and multi-layer perceptron (MLP). We performed computational experiments for 21 high throughput clinically annotated gene expression datasets totally including 1778 cancer patients who either responded or not on chemotherapy treatments. The biggest dataset had samples for 235, whereas the smallest for 41 individual cases. For global ML methods, such as SVM, RF, BNB, ADA and MLP, FloWPS essentially improved the classifier quality. Namely, the area under the receiver-operator curve (ROC AUC) for the responder vs non-responder classifier, increased from typical range 0.65–0.85 to 0.80–0.95, respectively. On the other hand, FloWPS was shown useless for purely local ML techniques such as kNN method or RR. However, both these local methods exhibited low sensitivity or specificity in cases when false positive or false negative errors, respectively, should be avoided. According to sensitivity-specificity criterion, for all the datasets tested, the best performance in combination with FloWPS data trimming was shown for the binomial naïve Bayesian method, which can be valuable for further development of predictors in personalized oncology.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Buzdin, A., et al.: RNA sequencing for research and diagnostics in clinical oncology. Semin. Cancer Biol. (2019). https://doi.org/10.1016/j.semcancer.2019.07.010
Article Google Scholar
Zhukov, N.V., Tjulandin, S.A.: Targeted therapy in the treatment of solid tumors: practice contradicts theory. Biochem. Biokhim. 73, 605–618 (2008)
Article Google Scholar
Borisov, N., Buzdin, A.: New paradigm of machine learning (ML) in personalized oncology: data trimming for squeezing more biomarkers from clinical datasets. Front. Oncol. 9, 658 (2019). https://doi.org/10.3389/fonc.2019.00658
Article Google Scholar
Artemov, A., et al.: A method for predicting target drug efficiency in cancer based on the analysis of signaling pathway activation. Oncotarget 6, 29347–29356 (2015). https://doi.org/10.18632/oncotarget.5119
Shepelin, D., et al.: Molecular pathway activation features linked with transition from normal skin to primary and metastatic melanomas in human. Oncotarget 7, 656–670 (2016). https://doi.org/10.18632/oncotarget.6394
Zolotovskaia, M.A., et al.: Pathway based analysis of mutation data is efficient for scoring target cancer drugs. Front. Pharmacol. 10 (2019). https://doi.org/10.3389/fphar.2019.00001
Turki, T., Wang, J.T.L.: Clinical intelligence: new machine learning techniques for predicting clinical drug response. Comput. Biol. Med. 107, 302–322 (2019). https://doi.org/10.1016/j.compbiomed.2018.12.017
Article Google Scholar
Turki, T., Wei, Z.: A link prediction approach to cancer drug sensitivity prediction. BMC Syst. Biol. 11 (2017). https://doi.org/10.1186/s12918-017-0463-8
Turki, T., Wei, Z., Wang, J.T.L.: Transfer learning approaches to improve drug sensitivity prediction in multiple myeloma patients. IEEE Access 5, 7381–7393 (2017). https://doi.org/10.1109/ACCESS.2017.2696523
Article Google Scholar
Turki, T., Wei, Z., Wang, J.T.L.: A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction. J. Bioinform. Comput. Biol. 16, 1840014 (2018). https://doi.org/10.1142/S0219720018400140
Article Google Scholar
Mulligan, G., et al.: Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. Blood 109, 3177–3188 (2007). https://doi.org/10.1182/blood-2006-09-044974
Article Google Scholar
Borisov, N., Tkachev, V., Muchnik, I., Buzdin, A.: Individual Drug Treatment Prediction in Oncology Based on Machine Learning Using Cell Culture Gene Expression Data (2017). https://doi.org/10.1145/3155077.3155078
Borisov, N., Tkachev, V., Suntsova, M., Kovalchuk, O., Zhavoronkov, A., Muchnik, I., Buzdin, A.: A method of gene expression data transfer from cell lines to cancer patients for machine-learning prediction of drug efficiency. Cell Cycle 17, 486–491 (2018). https://doi.org/10.1080/15384101.2017.1417706
Article Google Scholar
Borisov, N., Tkachev, V., Buzdin, A., Muchnik, I.: Prediction of drug efficiency by transferring gene expression data from cell lines to cancer patients. In: Rozonoer, L., Mirkin, B., Muchnik, I. (eds.) Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. LNCS (LNAI), vol. 11100, pp. 201–212. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99492-5_9
Chapter Google Scholar
Tkachev, V., et al.: FLOating-window projective separator (FloWPS): a data trimming tool for support vector machines (SVM) to improve robustness of the classifier. Front. Genet. 9 (2019). https://doi.org/10.3389/fgene.2018.00717
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992). https://doi.org/10.1080/00031305.1992.10475879
Article MathSciNet Google Scholar
Toloşi, L., Lengauer, T.: Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27, 1986–1994 (2011). https://doi.org/10.1093/bioinformatics/btr300
Article Google Scholar
Tikhonov, A.N., Arsenin, V.I.: Solutions of Ill-Posed Problems. Winston ; Distributed solely by Halsted Press, Washington (1977)
Google Scholar
Cho, H.-J., Lee, S., Ji, Y.G., Lee, D.H.: Association of specific gene mutations derived from machine learning with survival in lung adenocarcinoma. PLoS ONE 13, e0207204 (2018). https://doi.org/10.1371/journal.pone.0207204
Article Google Scholar
Davoudi, A., Ozrazgat-Baslanti, T., Ebadi, A., Bursian, A.C., Bihorac, A., Rashidi, P.: Delirium prediction using machine learning models on predictive electronic health records data. In: 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE), pp. 568–573. IEEE, Washington, DC (2017). https://doi.org/10.1109/BIBE.2017.00014
Turki, T., Wei, Z.: Learning approaches to improve prediction of drug sensitivity in breast cancer patients. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 3314–3320. IEEE, Orlando, FL, USA (2016). https://doi.org/10.1109/EMBC.2016.7591437
Zhang, L., et al.: Applications of machine learning methods in drug toxicity prediction. Curr. Top. Med. Chem. 18 (2018). https://doi.org/10.2174/1568026618666180727152557
Article Google Scholar
Wang, Z., et al.: In silico prediction of blood-brain barrier permeability of compounds by machine learning and resampling methods. Chem. Med. Chem. 13, 2189–2201 (2018). https://doi.org/10.1002/cmdc.201800533
Article Google Scholar
Yosipof, A., Guedes, R.C., García-Sosa, A.T.: Data mining and machine learning models for predicting drug likeness and their disease or organ category. Front. Chem. 6 (2018). https://doi.org/10.3389/fchem.2018.00162
Minsky, M.L., Papert, S.A.: Perceptrons - Expanded Edition: An Introduction to Computational Geometry. MIT press, Boston (1987)
MATH Google Scholar
Prados, J., Kalousis, A., Sanchez, J.-C., Allard, L., Carrette, O., Hilario, M.: Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents. Proteomics 4, 2320–2332 (2004). https://doi.org/10.1002/pmic.200400857
Article Google Scholar
Robin, X., Turck, N., Hainard, A., Lisacek, F., Sanchez, J.-C., Müller, M.: Bioinformatics for protein biomarker panel classification: what is needed to bring biomarker panels into in vitro diagnostics? Expert Rev. Proteomics 6, 675–689 (2009). https://doi.org/10.1586/epr.09.83
Article Google Scholar
Gent, D.H., Esker, P.D., Kriss, A.B.: Statistical power in plant pathology research. Phytopathology 108, 15–22 (2018). https://doi.org/10.1094/PHYTO-03-17-0098-LE
Article Google Scholar
Ioannidis, J.P.A., Hozo, I., Djulbegovic, B.: Optimal type I and type II error pairs when the available sample size is fixed. J. Clin. Epidemiol. 66, 903–910.e2 (2013). https://doi.org/10.1016/j.jclinepi.2013.03.002
Article Google Scholar
Wetterslev, J., Jakobsen, J.C., Gluud, C.: Trial sequential analysis in systematic reviews with meta-analysis. BMC Med. Res. Methodol. 17, 39 (2017). https://doi.org/10.1186/s12874-017-0315-7
Article Google Scholar
Kim, H.-Y.: Statistical notes for clinical researchers: Type I and type II errors in statistical decision. Restorative Dent. Endodontics 40, 249 (2015). https://doi.org/10.5395/rde.2015.40.3.249
Article Google Scholar
Lu, J., Qiu, Y., Deng, A.: A note on type S/M errors in hypothesis testing. Br. J. Math. Stat. Psychol. 72, 1–17 (2019). https://doi.org/10.1111/bmsp.12132
Article MATH Google Scholar
Litière, S., Alonso, A., Molenberghs, G.: Type I and Type II error under random-effects misspecification in generalized linear mixed models. Biometrics 63, 1038–1044 (2007). https://doi.org/10.1111/j.1541-0420.2007.00782.x
Article MathSciNet MATH Google Scholar
Cummins, R.O., Hazinski, M.F.: Guidelines based on fear of type II (false-negative) errors: why we dropped the pulse check for lay rescuers. Circulation 102, I377–I379 (2000)
Google Scholar
Rodriguez, P., Maestre, Z., Martinez-Madrid, M., Reynoldson, T.B.: Evaluating the type II error rate in a sediment toxicity classification using the reference condition approach. Aquat. Toxicol. 101, 207–213 (2011). https://doi.org/10.1016/j.aquatox.2010.09.020
Article Google Scholar
Hatzis, C., et al.: A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. JAMA 305, 1873–1881 (2011). https://doi.org/10.1001/jama.2011.593
Article Google Scholar
Itoh, M., et al.: Estrogen receptor (ER) mRNA expression and molecular subtype distribution in ER-negative/progesterone receptor-positive breast cancers. Breast Cancer Res. Treat. 143, 403–409 (2014). https://doi.org/10.1007/s10549-013-2763-z
Article Google Scholar
Horak, C.E., et al.: Biomarker analysis of neoadjuvant doxorubicin/cyclophosphamide followed by ixabepilone or Paclitaxel in early-stage breast cancer. Clin. Cancer Res. 19, 1587–1595 (2013). https://doi.org/10.1158/1078-0432.CCR-12-1359
Article Google Scholar
Korde, L.A., et al.: Gene expression pathway analysis to predict response to neoadjuvant docetaxel and capecitabine for breast cancer. Breast Cancer Res. Treat. 119, 685–699 (2010). https://doi.org/10.1007/s10549-009-0651-3
Article Google Scholar
Miller, W.R., Larionov, A.: Changes in expression of oestrogen regulated and proliferation genes with neoadjuvant treatment highlight heterogeneity of clinical resistance to the aromatase inhibitor, letrozole. Breast Cancer Res. 12, R52 (2010). https://doi.org/10.1186/bcr2611
Article Google Scholar
Miller, W.R., Larionov, A., Anderson, T.J., Evans, D.B., Dixon, J.M.: Sequential changes in gene expression profiles in breast cancers during treatment with the aromatase inhibitor, letrozole. Pharmacogenomics J. 12, 10–21 (2012). https://doi.org/10.1038/tpj.2010.67
Article Google Scholar
Popovici, V., et al.: Effect of training-sample size and classification difficulty on the accuracy of genomic predictors. Breast Cancer Res. 12, R5 (2010). https://doi.org/10.1186/bcr2468
Article Google Scholar
Iwamoto, T., et al.: Gene pathways associated with prognosis and chemotherapy sensitivity in molecular subtypes of breast cancer. J. Nat. Cancer Inst. 103, 264–272 (2011). https://doi.org/10.1093/jnci/djq524
Article Google Scholar
Miyake, T., et al.: GSTP1 expression predicts poor pathological complete response to neoadjuvant chemotherapy in ER-negative breast cancer. Cancer Sci. 103, 913–920 (2012). https://doi.org/10.1111/j.1349-7006.2012.02231.x
Article Google Scholar
Liu, J.C., et al.: Seventeen-gene signature from enriched Her2/Neu mammary tumor-initiating cells predicts clinical outcome for human HER2+: ERα- breast cancer. Proc. Natl. Acad. Sci. U.S.A. 109, 5832–5837 (2012). https://doi.org/10.1073/pnas.1201105109
Article Google Scholar
Shen, K., et al.: Cell line derived multi-gene predictor of pathologic response to neoadjuvant chemotherapy in breast cancer: a validation study on US Oncology 02-103 clinical trial. BMC Med. Genomics 5, 51 (2012). https://doi.org/10.1186/1755-8794-5-51
Article Google Scholar
Turnbull, A.K., et al.: Accurate prediction and validation of response to endocrine therapy in breast cancer. J. Clin. Oncol. 33, 2270–2278 (2015). https://doi.org/10.1200/JCO.2014.57.8963
Article Google Scholar
Chauhan, D., et al.: A small molecule inhibitor of ubiquitin-specific protease-7 induces apoptosis in multiple myeloma cells and overcomes bortezomib resistance. Cancer Cell 22, 345–358 (2012). https://doi.org/10.1016/j.ccr.2012.08.007
Article Google Scholar
Terragna, C., et al.: The genetic and genomic background of multiple myeloma patients achieving complete response after induction therapy with bortezomib, thalidomide and dexamethasone (VTD). Oncotarget 7, 9666–9679 (2016). https://doi.org/10.18632/oncotarget.5718
Amin, S.B., et al.: Gene expression profile alone is inadequate in predicting complete response in multiple myeloma. Leukemia 28, 2229–2234 (2014). https://doi.org/10.1038/leu.2014.140
Article Google Scholar
Raponi, M., et al.: Identification of molecular predictors of response in a study of tipifarnib treatment in relapsed and refractory acute myelogenous leukemia. Clin. Cancer Res. 13, 2254–2260 (2007). https://doi.org/10.1158/1078-0432.CCR-06-2609
Article Google Scholar
Goldman, M., et al.: The UCSC cancer genomics browser: update 2015. Nucleic Acids Res. 43, D812–D817 (2015). https://doi.org/10.1093/nar/gku1073
Article Google Scholar
Tricoli, J.V., et al.: Biologic and clinical characteristics of adolescent and young adult cancers: acute lymphoblastic leukemia, colorectal cancer, breast cancer, melanoma, and sarcoma: biology of AYA cancers. Cancer 122, 1017–1028 (2016). https://doi.org/10.1002/cncr.29871
Article Google Scholar
Tomczak, K., Czerwińska, P., Wiznerowicz, M.: The cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. (Poznan, Poland) 19, A68–A77 (2015). https://doi.org/10.5114/wo.2014.47136
Article Google Scholar

Download references

Acknowledgements

The study was supported by Russian Foundation for Basic Research Grant 19-29-01108.

Author information

Authors and Affiliations

Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, USA
Victor Tkachev, Anton Buzdin & Nicolas Borisov
I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, 119991, Russia
Anton Buzdin & Nicolas Borisov

Authors

Victor Tkachev
View author publications
You can also search for this author in PubMed Google Scholar
Anton Buzdin
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Borisov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicolas Borisov .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Pittsburgh, Pittsburgh, PA, USA
Takis Benos
The University of Texas MD Anderson Cancer Center, Houston, TX, USA
Ken Chen
ETH Zurich, Basel, Switzerland
Katharina Jahn
The University of Texas, Austin, TX, USA
Ernesto Lima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tkachev, V., Buzdin, A., Borisov, N. (2019). Flexible Data Trimming for Different Machine Learning Methods in Omics-Based Personalized Oncology. In: Bebis, G., Benos, T., Chen, K., Jahn, K., Lima, E. (eds) Mathematical and Computational Oncology. ISMCO 2019. Lecture Notes in Computer Science(), vol 11826. Springer, Cham. https://doi.org/10.1007/978-3-030-35210-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-35210-3_5
Published: 12 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35209-7
Online ISBN: 978-3-030-35210-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics