Abstract
Identification of drug–target interactions is a crucial process in drug discovery. In this chapter, we present protocols for recent advancements in machine learning methods for predicting drug–target interactions from heterogeneous biological data in a chemogenomic framework, in which prediction is based on the chemical structure data of drug candidate compounds and translated genomic sequence data of target candidate proteins. Most existing methods are based on either linear modeling or kernel modeling. To illustrate linear modeling, we introduce sparsity-induced binary classifiers and sparse canonical correlation analysis. To illustrate kernel modeling, we introduce pairwise kernel-based support vector machines and kernel-based distance learning. Workflows for using these techniques are presented. We also discuss the characteristics of each method and suggest some directions for future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wang Y, Xiao J, Suzek T, Zhang J, Wang J, Bryant S (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37:D623–D633
Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res 36(Database issue):D480–D485
Gunther S, Guenther S, Kuhn M, Dunkel M et al (2008) Supertarget and matador: resources for exploring drug-target relationships. Nucleic Acids Res 36:D919–D922
Wishart D, Knox C, Guo A, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36:D901–D906
Butina D, Segall M, Frankcombe K (2002) Predicting ADME properties in silico: methods and models. Drug Discov Today 7:S83–S88
Byvatov E, Fechner U, Sadowski J, Schneider G (2003) Comparison of support vector machine and artificial neural network systems for drug/nondrug classication. J Chem Inf Comput Sci 43:1882–1889
Rarey M, Kramer B, Lengauer T, Klebe G (1996) A fast flexible dockingmethod using an incremental construction algorithm. J Mol Biol 261:470–489
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita K, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34:D354–D357
Stockwell B (2000) Chemical genetics: ligand-based discovery of gene function. Nat Rev Genet 1:116–125
Dobson C (2004) Chemical space and biology. Nature 432:824–828
Nagamine N, Sakakibara Y (2007) Statistical prediction of protein-chemical interactions based on chemical structure and mass spectrometry data. Bioinformatics 23:2004–2012
Faulon J, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme-metabolite and drugtarget interaction predictions using the signature molecular descriptor. Bioinformatics 24:225–233
Jacob L, Vert J-P (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24:2149–2156
Yabuuchi H, Niijima S, Takematsu H, Ida T, Hirokawa T, Hara T, Ogawa T, Minowa Y, Tsujimoto G, Okuno Y (2011) Analysis of multiple compound-protein interactions reveals novel bioactive molecules. Mol Syst Biol 7:472
Tabei Y, Pauwels E, Stoven V, Takemoto K, Yamanishi Y (2012) Identification of chemogenomic features from drug-target interaction networks using interpretable classifiers. Bioinformatics 28:i487–i494
Tabei Y, Yamanishi Y (2013) Scalable prediction of compound-protein interactions using minwise hashing. BMC Syst Biol 7(Suppl 6):S3
Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB et al (2009) Predicting new molecular targets for known drugs. Nature 462:175–181
Bleakley K, Yamanishi Y (2009) Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics 25:2397–2403
Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M (2008) Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24:i232–i240
Yamanishi Y (2009) Supervised bipartite graph inference. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Adv. neural inform. process. syst. 21. MIT Press, Cambridge, MA, pp 1841–1848
Yamanishi Y, Pauwels E, Saigo H, Stoven V (2011) Extracting sets of chemical substructures and protein domains governing drug-target interactions. J Chem Inf Model 51:1183–1194
Todeschini R, Consonni V (2002) Handbook of molecular descriptors. Wiley-VCH, New York
Rogers D, Hahn M (2010) Extended-Connectivity Fingerprints. J Chem Inf Model 50:742–754
Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci 1995(35):1039–1045
Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12:2111–2120
Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24:2518–2525
Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42:1273–1280
Chen B, Wild D, Guha R (2009) PubChem as a source of polypharmacology. J Chem Inf Model 49:2044–2055
Kotera M, Tabei Y, Yamanishi Y, Moriya Y, Tokimatsu T, Kanehisa M, Goto S (2013) KCF-S: KEGG chemical function and substructure for improved interpretability and prediction in chemical bioinformatics. BMC Syst Biol 7(Suppl 6):S2
Hattori M, Okuno Y, Goto S, Kanehisa M (2003) Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc 125:11853–11865
Finn R, Tate J, Mistry J, Coggill P, Sammut J, Hotz H, Ceric G, Forslund K, Eddy S, Sonnhammer E, Bateman A (2012) The Pfam protein families database. Nucleic Acids Res 36:D281–D288
Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197
Saigo H, Vert J, Ueda N, Akutsu T (2004) Protein homology detection using stringalignment kernels. Bioinformatics 20:1682–1689
Yildirim M, Goh K, Cusick M, Barabasi A, Vidal M (2007) Drug-target network. Nat Biotechnol 25:1119–1126
Schölkopf B, Tsuda K, Vert J (2004) Kernel methods in computational biology. MIT Press, Cambridge, MA
Lodhi H, Yamanishi Y (2010) Chemoinformatics and advanced machine learning perspectives: complex computational methods and collaborative techniques. IGI Global, Hershey
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Camb. Univ. Press, Cambridge
Campillos M, Kuhn M, Gavin A, Jensen L, Bork P (2008) Drug target identification using side-effect similarity. Science 321(5886):263–266
Yamanishi Y, Kotera M, Kanehisa M, Goto S (2010) Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 26:i246–i254
Atias N, Sharan R (2010) An algorithmic framework for predicting side-effects of drugs. Proceedings of the 14th international conference on computational molecular biology (RECOMB 2010). pp 1–14
Kashima H, Tsuda K, Akihiro Inokuchi A (2003) Marginalized kernels between labeled graphs, Proceedings of ICML, 3. pp 321–328
Mahe P, Ueda N, Akutsu T, Perret J-L, Vert J-P (2005) Graph kernels for molecular structure-activity relationship analysis with support vector machines. J Chem Inf Model 45(4):939–951
Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Altman RB, Dunker AK, Hunter L, Lauerdale K, Klein TE (eds) Proceedings of the pacific symposium on biocomputing 2002. World Scientific, Singapore, pp 564–575
Leslie C, Eskin E, Weston J, Noble WS (2003) Mismatch string kernels for SVM protein classification. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems. MIT Press, Cambridge, p 15
Mahe P, Ralaivola L, Stoven V, Vert J (2006) The pharmacophore kernel for virtual screening with support vector machines. J Chem Inf Model 46:2003–2014
Kratochwil N, Malherbe P, Lindemann L, Ebeling M, Hoener M, Muhlemann A, Porter R, Stahl M, Gerber P (2005) An automated system for the analysis of g protein-coupled receptor transmembrane binding pockets: Alignment, receptor-based pharmacophores, and their application. J Chem Inf Model 45:1324–1336
Acknowledgments
This work is supported by JST PRESTO Grant Number JPMJPR15D8.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Yamanishi, Y. (2018). Linear and Kernel Model Construction Methods for Predicting Drug–Target Interactions in a Chemogenomic Framework. In: Brown, J. (eds) Computational Chemogenomics. Methods in Molecular Biology, vol 1825. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8639-2_12
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8639-2_12
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8638-5
Online ISBN: 978-1-4939-8639-2
eBook Packages: Springer Protocols