Linear and Kernel Model Construction Methods for Predicting Drug–Target Interactions in a Chemogenomic Framework

  • Yoshihiro YamanishiEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1825)


Identification of drug–target interactions is a crucial process in drug discovery. In this chapter, we present protocols for recent advancements in machine learning methods for predicting drug–target interactions from heterogeneous biological data in a chemogenomic framework, in which prediction is based on the chemical structure data of drug candidate compounds and translated genomic sequence data of target candidate proteins. Most existing methods are based on either linear modeling or kernel modeling. To illustrate linear modeling, we introduce sparsity-induced binary classifiers and sparse canonical correlation analysis. To illustrate kernel modeling, we introduce pairwise kernel-based support vector machines and kernel-based distance learning. Workflows for using these techniques are presented. We also discuss the characteristics of each method and suggest some directions for future research.

Key words

Drug–target interactions Machine learning Classification Linear modeling Sparse modeling Kernel methods Chemogenomics 



This work is supported by JST PRESTO Grant Number JPMJPR15D8.


  1. 1.
    Wang Y, Xiao J, Suzek T, Zhang J, Wang J, Bryant S (2009) PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 37:D623–D633CrossRefGoogle Scholar
  2. 2.
    Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y (2008) KEGG for linking genomes to life and the environment. Nucleic Acids Res 36(Database issue):D480–D485PubMedGoogle Scholar
  3. 3.
    Gunther S, Guenther S, Kuhn M, Dunkel M et al (2008) Supertarget and matador: resources for exploring drug-target relationships. Nucleic Acids Res 36:D919–D922CrossRefGoogle Scholar
  4. 4.
    Wishart D, Knox C, Guo A, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M (2008) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36:D901–D906CrossRefGoogle Scholar
  5. 5.
    Butina D, Segall M, Frankcombe K (2002) Predicting ADME properties in silico: methods and models. Drug Discov Today 7:S83–S88CrossRefGoogle Scholar
  6. 6.
    Byvatov E, Fechner U, Sadowski J, Schneider G (2003) Comparison of support vector machine and artificial neural network systems for drug/nondrug classication. J Chem Inf Comput Sci 43:1882–1889CrossRefGoogle Scholar
  7. 7.
    Rarey M, Kramer B, Lengauer T, Klebe G (1996) A fast flexible dockingmethod using an incremental construction algorithm. J Mol Biol 261:470–489CrossRefGoogle Scholar
  8. 8.
    Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita K, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34:D354–D357CrossRefGoogle Scholar
  9. 9.
    Stockwell B (2000) Chemical genetics: ligand-based discovery of gene function. Nat Rev Genet 1:116–125CrossRefGoogle Scholar
  10. 10.
    Dobson C (2004) Chemical space and biology. Nature 432:824–828CrossRefGoogle Scholar
  11. 11.
    Nagamine N, Sakakibara Y (2007) Statistical prediction of protein-chemical interactions based on chemical structure and mass spectrometry data. Bioinformatics 23:2004–2012CrossRefGoogle Scholar
  12. 12.
    Faulon J, Misra M, Martin S, Sale K, Sapra R (2008) Genome scale enzyme-metabolite and drugtarget interaction predictions using the signature molecular descriptor. Bioinformatics 24:225–233CrossRefGoogle Scholar
  13. 13.
    Jacob L, Vert J-P (2008) Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24:2149–2156CrossRefGoogle Scholar
  14. 14.
    Yabuuchi H, Niijima S, Takematsu H, Ida T, Hirokawa T, Hara T, Ogawa T, Minowa Y, Tsujimoto G, Okuno Y (2011) Analysis of multiple compound-protein interactions reveals novel bioactive molecules. Mol Syst Biol 7:472CrossRefGoogle Scholar
  15. 15.
    Tabei Y, Pauwels E, Stoven V, Takemoto K, Yamanishi Y (2012) Identification of chemogenomic features from drug-target interaction networks using interpretable classifiers. Bioinformatics 28:i487–i494CrossRefGoogle Scholar
  16. 16.
    Tabei Y, Yamanishi Y (2013) Scalable prediction of compound-protein interactions using minwise hashing. BMC Syst Biol 7(Suppl 6):S3CrossRefGoogle Scholar
  17. 17.
    Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, Jensen NH, Kuijer MB, Matos RC, Tran TB et al (2009) Predicting new molecular targets for known drugs. Nature 462:175–181CrossRefGoogle Scholar
  18. 18.
    Bleakley K, Yamanishi Y (2009) Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics 25:2397–2403CrossRefGoogle Scholar
  19. 19.
    Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M (2008) Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24:i232–i240CrossRefGoogle Scholar
  20. 20.
    Yamanishi Y (2009) Supervised bipartite graph inference. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Adv. neural inform. process. syst. 21. MIT Press, Cambridge, MA, pp 1841–1848Google Scholar
  21. 21.
    Yamanishi Y, Pauwels E, Saigo H, Stoven V (2011) Extracting sets of chemical substructures and protein domains governing drug-target interactions. J Chem Inf Model 51:1183–1194CrossRefGoogle Scholar
  22. 22.
    Todeschini R, Consonni V (2002) Handbook of molecular descriptors. Wiley-VCH, New YorkGoogle Scholar
  23. 23.
    Rogers D, Hahn M (2010) Extended-Connectivity Fingerprints. J Chem Inf Model 50:742–754CrossRefGoogle Scholar
  24. 24.
    Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci 1995(35):1039–1045CrossRefGoogle Scholar
  25. 25.
    Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12:2111–2120CrossRefGoogle Scholar
  26. 26.
    Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24:2518–2525CrossRefGoogle Scholar
  27. 27.
    Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42:1273–1280CrossRefGoogle Scholar
  28. 28.
    Chen B, Wild D, Guha R (2009) PubChem as a source of polypharmacology. J Chem Inf Model 49:2044–2055CrossRefGoogle Scholar
  29. 29.
    Kotera M, Tabei Y, Yamanishi Y, Moriya Y, Tokimatsu T, Kanehisa M, Goto S (2013) KCF-S: KEGG chemical function and substructure for improved interpretability and prediction in chemical bioinformatics. BMC Syst Biol 7(Suppl 6):S2CrossRefGoogle Scholar
  30. 30.
    Hattori M, Okuno Y, Goto S, Kanehisa M (2003) Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc 125:11853–11865CrossRefGoogle Scholar
  31. 31.
    Finn R, Tate J, Mistry J, Coggill P, Sammut J, Hotz H, Ceric G, Forslund K, Eddy S, Sonnhammer E, Bateman A (2012) The Pfam protein families database. Nucleic Acids Res 36:D281–D288CrossRefGoogle Scholar
  32. 32.
    Smith T, Waterman M (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197CrossRefGoogle Scholar
  33. 33.
    Saigo H, Vert J, Ueda N, Akutsu T (2004) Protein homology detection using stringalignment kernels. Bioinformatics 20:1682–1689CrossRefGoogle Scholar
  34. 34.
    Yildirim M, Goh K, Cusick M, Barabasi A, Vidal M (2007) Drug-target network. Nat Biotechnol 25:1119–1126CrossRefGoogle Scholar
  35. 35.
    Schölkopf B, Tsuda K, Vert J (2004) Kernel methods in computational biology. MIT Press, Cambridge, MAGoogle Scholar
  36. 36.
    Lodhi H, Yamanishi Y (2010) Chemoinformatics and advanced machine learning perspectives: complex computational methods and collaborative techniques. IGI Global, HersheyGoogle Scholar
  37. 37.
    Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Camb. Univ. Press, CambridgeCrossRefGoogle Scholar
  38. 38.
    Campillos M, Kuhn M, Gavin A, Jensen L, Bork P (2008) Drug target identification using side-effect similarity. Science 321(5886):263–266CrossRefGoogle Scholar
  39. 39.
    Yamanishi Y, Kotera M, Kanehisa M, Goto S (2010) Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 26:i246–i254CrossRefGoogle Scholar
  40. 40.
    Atias N, Sharan R (2010) An algorithmic framework for predicting side-effects of drugs. Proceedings of the 14th international conference on computational molecular biology (RECOMB 2010). pp 1–14Google Scholar
  41. 41.
    Kashima H, Tsuda K, Akihiro Inokuchi A (2003) Marginalized kernels between labeled graphs, Proceedings of ICML, 3. pp 321–328Google Scholar
  42. 42.
    Mahe P, Ueda N, Akutsu T, Perret J-L, Vert J-P (2005) Graph kernels for molecular structure-activity relationship analysis with support vector machines. J Chem Inf Model 45(4):939–951CrossRefGoogle Scholar
  43. 43.
    Leslie C, Eskin E, Noble WS (2002) The spectrum kernel: a string kernel for SVM protein classification. In: Altman RB, Dunker AK, Hunter L, Lauerdale K, Klein TE (eds) Proceedings of the pacific symposium on biocomputing 2002. World Scientific, Singapore, pp 564–575Google Scholar
  44. 44.
    Leslie C, Eskin E, Weston J, Noble WS (2003) Mismatch string kernels for SVM protein classification. In: Becker S, Thrun S, Obermayer K (eds) Advances in neural information processing systems. MIT Press, Cambridge, p 15Google Scholar
  45. 45.
    Mahe P, Ralaivola L, Stoven V, Vert J (2006) The pharmacophore kernel for virtual screening with support vector machines. J Chem Inf Model 46:2003–2014CrossRefGoogle Scholar
  46. 46.
    Kratochwil N, Malherbe P, Lindemann L, Ebeling M, Hoener M, Muhlemann A, Porter R, Stahl M, Gerber P (2005) An automated system for the analysis of g protein-coupled receptor transmembrane binding pockets: Alignment, receptor-based pharmacophores, and their application. J Chem Inf Model 45:1324–1336CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems EngineeringKyushu Institute of TechnologyIizukaJapan
  2. 2.PRESTOJapan Science and Technology AgencyKawaguchiJapan

Personalised recommendations