Abstract
The paper represents a novel approach for individual medical treatment in oncology, based on machine learning with transferring gene expression data, obtained on cell lines, onto individual cancer patients for drug efficiency prediction. We give a detailed analysis how to build drug response classifiers, on the example of three experimental pairs of data “kind of cancer/chosen drug for treatment”. The main hardness of the problem was the meager size of patient training data: it is many many hundred times smaller than a dimensionality of original feature space.
The core feature of our transfer technique is to avoid extrapolation in the feature space when make any predictions of the clinical outcome of the treatment for a patient using gene expression data for cell lines. We can assure that there is no extrapolation by special selection of dimensions of the feature space, which provide sufficient number, say M, of cell line points both below and above any point that correspond to a patient. Additionally, in a manner that is a little similar to the k nearest neighbor (kNN) method, after the selection of feature subspace, we take into account only K cell line points that are closer to a patient’s point in the selected subspace. Having varied different feasible values of K and M, we showed that the predictor’s accuracy considered AUC, for all three cases of cancer-like diseases are equal or higher than 0.7.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In that one could find an analogy between the situation and very popular now case, called “domain adaptation” [5].
References
Vapnik, V., Izmailov, R.: Learning using privileged information: similarity control and knowledge transfer. J. Mach. Learn. Res. 16, 2023–2049 (2015)
Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information. In: ICLR 2016, San Juan, Puerto Rico (2016)
Xu, X., Zhou, J.T., Tsang, I., Qin, Z., Goh, R.S.M., Liu, Y.: Simple and efficient learning using privileged information (2016)
Celik, Z.B., Izmailov, R., McDaniel, P.: Proof and implementation of algorithmic realization of learning using privileged information (LUPI). In: Paradigm: SVM+. Institute of Networking and Security Research (INSR) (2015)
Csurka, G.: Domain Adaptation in Computer Vision Applications. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58347-1
Artemov, A., et al.: A method for predicting target drug efficiency in cancer based on the analysis of signaling pathway activation. Oncotarget 6, 29347–29356 (2015)
Minsky, M.L., Papert, S.A.: Perceptrons - Expanded Edition: An Introduction to Computational Geometry. MIT Press, Boston (1987)
Blumenschein, G.R., et al.: Comprehensive biomarker analysis and final efficacy results of sorafenib in the BATTLE trial. Clin. Cancer Res 19, 6967–6975 (2013). Off. J. Am. Assoc. Cancer Res.
Crossman, L.C., et al.: In chronic myeloid leukemia white cells from cytogenetic responders and non-responders to imatinib have very similar gene expression signatures. Haematologica 90, 459–464 (2005)
Mulligan, G., et al.: Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. Blood 109, 3177–3188 (2007)
Yang, W., et al.: Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 41, D955–D961 (2013)
Robin, X., Turck, N., Hainard, A., Lisacek, F., Sanchez, J.-C., Müller, M.: Bioinformatics for protein biomarker panel classification: what is needed to bring biomarker panels into in vitro diagnostics? Expert Rev. Proteomics 6, 675–689 (2009)
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines, pp. 276–85. IEEE (1997). http://ieeexplore.ieee.org/document/622408/. Accessed 23 May 2017
Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers. In: Advances in Kernel Methods. Support Vector Learn, pp. 43–54 (1999)
Toloşi, L., Lengauer, T.: Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27, 1986–1994 (2011)
Buzdin, A.A., et al.: Oncofinder, a new method for the analysis of intracellular signaling pathway activation using transcriptomic data. Front Genet. 5, 55 (2014)
Buzdin, A.A., Prassolov, V., Zhavoronkov, A.A., Borisov, N.M.: Bioinformatics meets biomedicine: oncofinder, a quantitative approach for interrogating molecular pathways using gene expression data. Methods Mol. Biol. 1613, 53–83 (2017). Clifton NJ.
Aliper, A.M., et al.: Mathematical justification of expression-based pathway activation scoring (PAS). Methods Mol. Biol. 1613, 31–51 (2017). Clifton NJ
Borisov, N., et al.: Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data. Cell Cycle 16(19), 1810–1823 (2017). Georget Tex
Kuzmina, N.B., Borisov, N.M.: Handling complex rule-based models of mitogenic cell signaling (On the example of ERK activation upon EGF stimulation). Int. Proc. Chem. Biol. Env. Eng. 5, 76–82 (2011)
Karlsson, J., et al.: Clear cell sarcoma of the kidney demonstrates an embryonic signature indicative of a primitive nephrogenic origin. Genes Chromosomes Cancer 53, 381–391 (2014)
Kabbout, M., et al.: ETS2 mediated tumor suppressive function and MET oncogene inhibition in human non-small cell lung cancer. Clin. Cancer Res 19, 3383–3395 (2013). Off. J. Am. Assoc. Cancer Res.
Yagi, T., et al.: Identification of a gene expression signature associated with pediatric AML prognosis. Blood 102, 1849–1856 (2003)
Hodgson, J.G., et al.: Comparative analyses of gene copy number and mRNA expression in glioblastoma multiforme tumors and xenografts. Neuro-Oncology 11, 477–487 (2009)
Bhasin, M., Yuan, L., Keskin, D.B., Otu, H.H., Libermann, T.A., Oettgen, P.: Bioinformatic identification and characterization of human endothelial cell-restricted genes. BMC Genom. 11, 342 (2010)
Cheng, Y., Prusoff, W.H.: Relationship between the inhibition constant (K1) and the concentration of inhibitor which causes 50 per cent inhibition (I50) of an enzymatic reaction. Biochem. Pharmacol. 22, 3099–3108 (1973)
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)
Shabalin, A.A., Tjelmeland, H., Fan, C., Perou, C.M., Nobel, A.B.: Merging two gene-expression studies via cross-platform normalization. Bioinformatics 24, 1154–1160 (2008)
Rudy, J., Valafar, F.: Empirical comparison of cross-platform normalization methods for gene expression data. BMC Bioinform. 12, 467 (2011)
Wang, Q., Liu, X.: Screening of feature genes in distinguishing different types of breast cancer using support vector machine. OncoTargets Ther. 8, 2311–2317 (2015)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011)
Acknowledgements
This work was supported by the Russian Science Foundation grant 18-15-00061.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
The authors declare no conflicts of interests.
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices: Materials and Methods
Appendices: Materials and Methods
Transcriptome Profiling for Renal Cancer Samples
The details of experimental procedure at Illumina HumanHT-12v4 and CustomArray ECD 4X2K/12K platform were reported previously [19]. Raw expression data were deposited in the GEO database (http://www.ncbi.nlm.nih.gov/geo/), accession numbers GSE52519 and GSE65635.
Harmonization of Illumina and Custom Array Expression Profiles for Renal Cancer
To cross-harmonize the results for the Illumina and CustomArray gene expression profiling, all expression profiles were transformed with the XPN method [28] using the R package CONOR [29].
SVM, Binary Tree and Random Forest Machine Learning Procedures
All the SVM calculations were performed using the R package ‘e1071’ [30], that employs the C++ library ‘libsvm’ [31]. Calculations according to binary tree [14] and random forest [15] methods were done with the R packages ‘rpart’ and ‘randomForest’, respectively.
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Borisov, N., Tkachev, V., Buzdin, A., Muchnik, I. (2018). Prediction of Drug Efficiency by Transferring Gene Expression Data from Cell Lines to Cancer Patients. In: Rozonoer, L., Mirkin, B., Muchnik, I. (eds) Braverman Readings in Machine Learning. Key Ideas from Inception to Current State. Lecture Notes in Computer Science(), vol 11100. Springer, Cham. https://doi.org/10.1007/978-3-319-99492-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-99492-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99491-8
Online ISBN: 978-3-319-99492-5
eBook Packages: Computer ScienceComputer Science (R0)