Statistical Machine Learning for Agriculture and Human Health Care Based on Biomedical Big Data

  • Yoshihiro YamanishiEmail author
  • Yasuo Tabei
  • Masaaki Kotera
Conference paper
Part of the Mathematics for Industry book series (MFI, volume 28)


The availability of biomedical big data provides an opportunity to develop data-driven approaches in agriculture and human healthcare research. In this study, we investigate statistical machine learning approaches to metabolic pathway reconstruction and the prediction of drug–target interactions, using heterogeneous biomedical big data. We present an \(L_1\)-regularized pairwise support vector machine to predict unknown enzymatic reactions among metabolome-scale compounds, based on chemical transformation patterns of compounds. We also present supervised bipartite graph inference with kernel methods to predict unknown interactions between drugs and target proteins, based on the chemical structures of drugs and the amino acid sequences of proteins. We experimentally demonstrated that these methods could be applied to rational compound synthesis and efficient drug discovery for a range of human diseases. Such methods are expected to increase the productivity of research in food and pharmaceutical industries.


Metabolic pathways Drug targets Machine learning Classification Feature extraction Graph inference 



This work is supported by JST PRESTO Grant Number JPMJPR15D8, JSPS KAKENHI Grant Numbers 25700029 and 15K14980, and the Program to Disseminate Tenure Tracking System, MEXT, Japan and Kyushu University Interdisciplinary Programs in Education and Projects in Research Development.


  1. 1.
    Y. Toya, H. Shimizu, Flux analysis and metabolomics for systematic metabolic engineering of microorganisms. Biotechnol. Adv. 31, 818–826 (2013)CrossRefGoogle Scholar
  2. 2.
    D. Newman, G. Cragg, Natural products as sources of new drugs over the 30 years from 1981 to 2010. J. Nat. Prod. 75, 311–335 (2012)CrossRefGoogle Scholar
  3. 3.
    R. Nakabayashi, K. Saito, Metabolomics for unknown plant metabolites. Anal. Bioanal. Chem. 405, 5005–5011 (2013)CrossRefGoogle Scholar
  4. 4.
    F. Afendi, T. Okada, M. Yamazaki, A. Hirai-Morita, Y. Nakamura, K. Nakamura, S. Ikeda, H. Takahashi, M. Altaf-Ul-Amin, L. Darusman, K. Saito, S. Kanaya, KNApSAcK family databases: integrated metaboliteplant species databases for multifaceted plant research. Plant Cell Physiol. 53, e1 (2012)CrossRefGoogle Scholar
  5. 5.
    A. Sreekumar, L. Poisson, T. Rajendiran, A. Khan, Q. Cao, J. Yu, B. Laxman, R. Mehra, R. Lonigro, Y. Li, M. Nyati, A. Ahsan, S. Kalyana-Sundaram, B. Han, X. Cao, J. Byun, G. Omenn, D. Ghosh, S. Pennathur, D. Alexander, A. Berger, J. Shuster, J. Wei, S. Varambally, C. Beecher, A. Chinnaiyan, Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression. Nature 457, 910–914 (2009)CrossRefGoogle Scholar
  6. 6.
    P. Karp, Call for an enzyme genomics initiative. Genome Biol. 5, 401–401 (2004)CrossRefGoogle Scholar
  7. 7.
    F. Darvas, Predicting metabolic pathways by logic programming. J. Mol. Graphics 6, 80–86 (1988)CrossRefGoogle Scholar
  8. 8.
    J. Talafous, L. Sayre, J. Mieyal, G. Klopman, A dictionary model of mammalian xenobiotic metabolism. J. Chem. Inf. Comput. Sci. 34, 1326–1333 (1994)CrossRefGoogle Scholar
  9. 9.
    N. Greene, P. Judson, J. Langowski, C. Marchant, Knowledge-based expert systems for toxicity and metabolism prediction: DEREK, StAR and METEOR. SAR QSAR Environ. Res. 10, 299–314 (1999)CrossRefGoogle Scholar
  10. 10.
    J. Faulon, A. Sault, Stochastic generator of chemical structure. 3. reaction network generation. J. Chem. Inf. Comput. Sci. 41, 894–908 (2001)CrossRefGoogle Scholar
  11. 11.
    L. Ellis, J. Gao, K. Fenner, L. Wackett, The University of Minnesota pathway prediction system: predicting metabolic logic. Nucleic Acids Res. 36, W427–W432 (2008)CrossRefGoogle Scholar
  12. 12.
    Y. Moriya, D. Shigemizu, M. Hattori, T. Tokimatsu, M. Kotera, S. Goto, M. Kanehisa, PathPred: an enzyme-catalyzed metabolic pathway prediction server. Nucleic Acids Res. 38, W138–143 (2010)CrossRefGoogle Scholar
  13. 13.
    V. Hatzimanikatis, C. Li, J. Ionita, C. Henry, M. Jankowski, L. Broadbelt, Exploring the diversity of complex metabolic networks. Bioinformatics 21, 1603–1609 (2005)CrossRefGoogle Scholar
  14. 14.
    M. Kotera, A. McDonald, S. Boyce, K. Tipton, Eliciting possible reaction equations and metabolic pathways involving orphan metabolites. J. Chem. Inf. Model. 48, 2335–2349 (2008)CrossRefGoogle Scholar
  15. 15.
    M. Nakamura, T. Hachiya, Y. Saito, K. Sato, Y. Sakakibara, An efficient algorithm for de novo predictions of biochemical pathways between chemical compounds. BMC Bioinform. 13 (2012)Google Scholar
  16. 16.
    M. Kotera, Y. Tabei, Y. Yamanishi, T. Tokimatsu, S. Goto, Supervised de novo reconstruction of metabolic pathways from metabolome-scale compound sets. Bioinformatics 29, i135–i144 (2013)CrossRefGoogle Scholar
  17. 17.
    M. Kotera, Y. Tabei, Y. Yamanishi, A. Muto, Y. Moriya, T. Tokimatsu, S. Goto, Metabolome-scale prediction of intermediate compounds in multistep metabolic pathways with a recursive supervised approach. Bioinformatics 30, i165–i174 (2014)CrossRefGoogle Scholar
  18. 18.
    Y. Yamanishi, Y. Tabei, M. Kotera, Metabolome-scale de novo pathway reconstruction using regioisomer-sensitive graph alignments. Bioinformatics 31, i161–i170 (2015)CrossRefGoogle Scholar
  19. 19.
    R. Ramautar, R. Berger, J. van der Greef, T. Hankemeier, Human metabolomics: strategies to understand biology. Cur. Opin. Chem. Biol. 17, 841–846 (2013)CrossRefGoogle Scholar
  20. 20.
    H. Lodhi, Y. Yamanishi, Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques (IGI Global, 2010)Google Scholar
  21. 21.
    N. Nagamine, Y. Sakakibara, Statistical prediction of proteinchemical interactions based on chemical structure and mass spectrometry data. Bioinformatics 23, 2004–2012 (2007)CrossRefGoogle Scholar
  22. 22.
    J. Faulon, M. Misra, S. Martin, K. Sale, R. Sapra, Genome scale enzymemetabolite and drugtarget interaction predictions using the signature molecular descriptor. Bioinformatics 24, 225–233 (2008)CrossRefGoogle Scholar
  23. 23.
    L. Jacob, J.-P. Vert, Protein-ligand interaction prediction: an improved chemogenomics approach. Bioinformatics 24, 2149–2156 (2008)CrossRefGoogle Scholar
  24. 24.
    K. Bleakley, Y. Yamanishi, Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics 25, 2397–2403 (2009)CrossRefGoogle Scholar
  25. 25.
    Y. Tabei, E. Pauwels, V. Stoven, K. Takemoto, Y. Yamanishi, Identification of chemogenomic features from drug-target interaction networks using interpretable classifiers. Bioinformatics 28, i487–i494 (2012)CrossRefGoogle Scholar
  26. 26.
    Y. Yamanishi, M. Araki, A. Gutteridge, W. Honda, M. Kanehisa, Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24, i232–i240 (2008)CrossRefGoogle Scholar
  27. 27.
    Y. Yamanishi, Supervised bipartite graph inference. in Advances in Neural Information Processing Systems 21, ed. by D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (MIT Press, Cambridge, MA, 2009), pp. 1841–1848Google Scholar
  28. 28.
    Y. Yamanishi, M. Kotera, M. Kanehisa, S. Goto, Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics 26, i246–i254 (2010)CrossRefGoogle Scholar
  29. 29.
    M. Takarabe, M. Kotera, Y. Nishimura, S. Goto, Y. Yamanishi, Drug target prediction using adverse event report systems: a pharmacogenomic approach. Bioinformatics 28, i611–i618 (2012)CrossRefGoogle Scholar
  30. 30.
    J. Zhu, 1-norm support vector machines, in Advances in Neural Information Processing Systems 15, ed. by S. Becker, S. Thrun, K. Obermayer (MIT Press, Cambridge, MA, 2003), pp. 49–56Google Scholar
  31. 31.
    R.E. Fan, K.W. Chang, C.J. Hsieh, X. Wang, C.J. Lin, LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  32. 32.
    Y. Tabei, Y. Yamanishi, Scalable prediction of compound-protein interactions using minwise hashing. BMC Syst. Biol. 7, S3 (2013)CrossRefGoogle Scholar
  33. 33.
    G. Wahba, Splines Models for Observational Data: Series in Applied Mathematics (SIAM, Philadelphia, 1990)CrossRefzbMATHGoogle Scholar
  34. 34.
    F. Girosi, M. Jones, T. Poggio, Regularization theory and neural networks architectures. Neural Comput. 7, 219–269 (1995)CrossRefGoogle Scholar
  35. 35.
    J. Shawe-Taylor, N. Cristianini, Kernel Methods for Pattern Analysis (Cambridge University Press, London, 2004)CrossRefzbMATHGoogle Scholar
  36. 36.
    M. Kanehisa, S. Goto, Y. Sato, M. Furumichi, M. Tanabe, KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40, D109–114 (2012)CrossRefGoogle Scholar
  37. 37.
    C. Steinbeck, Y. Han, S. Kuhn, O. Horlacher, E. Luttmann, E. Willighagen, The chemistry development kit (CDK) an open-source Java library for Chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003)CrossRefGoogle Scholar
  38. 38.
    M. Kanehisa, S. Goto, M. Hattori, K. Aoki-Kinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki, M. Hirakawa, From genomics to chemical genomics: new developments in kegg. Nucleic Acids Res. 34, D354–357 (2006)CrossRefGoogle Scholar
  39. 39.
    S. Gunther, S. Guenther, M. Kuhn, M. Dunkel et al., Supertarget and matador: resources for exploring drug-target relationships. Nucleic Acids Res 36, D919–D922 (2008)CrossRefGoogle Scholar
  40. 40.
    D. Wishart, C. Knox, A. Guo, D. Cheng, S. Shrivastava, D. Tzur, B. Gautam, M. Hassanali, Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36, D901–D906 (2008)CrossRefGoogle Scholar
  41. 41.
    M. Hattori, Y. Okuno, S. Goto, M. Kanehisa, Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J. Am. Chem. Soc. 125, 11853–11865 (2003)CrossRefGoogle Scholar
  42. 42.
    T. Smith, M. Waterman, Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)CrossRefGoogle Scholar
  43. 43.
    H. Saigo, J. Vert, N. Ueda, T. Akutsu, Protein homology detection using string alignment kernels. Bioinformatics 20, 1682–1689 (2004)CrossRefGoogle Scholar
  44. 44.
    D.T. Stanton, T.W. Morris, S. Roychoudhury, C.N. Parker, Application of nearest-neighbor and cluster analyses in pharmaceutical lead discovery. J. Chem. Inf. Comput. Sci. 39, 21–27 (1999)CrossRefGoogle Scholar
  45. 45.
    M. Greenacre, Theory and Applications of Correspondence Analysis. (Academic Press, 1984)Google Scholar
  46. 46.
    K. Rainsford, Anti-inflammatory drugs in the 21st century. Subcell. Biochem. 42, 3–27 (2007)CrossRefGoogle Scholar
  47. 47.
    R. Sawada, H. Iwata, S. Mizutani, Y. Yamanishi, Target-based drug repositioning using large-scale chemical-protein interactome data. J. Chem. Inf. Model. 55, 27172730 (2015)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Yoshihiro Yamanishi
    • 1
    • 2
    • 3
    Email author
  • Yasuo Tabei
    • 4
  • Masaaki Kotera
    • 5
  1. 1.Division of System CohortMedical Institute of Bioregulation, Kyushu UniversityFukuokaJapan
  2. 2.Institute for Advanced Study, Kyushu UniversityFukuokaJapan
  3. 3.PRESTO, Japan Science and Technology AgencyKawaguchiJapan
  4. 4.RIKEN Center for Advanced Intelligence ProjectTokyoJapan
  5. 5.School of Life Science and TechnologyTokyo Institute of TechnologyTokyoJapan

Personalised recommendations