Fundamental Bioinformatic and Chemoinformatic Data Processing

  • J. B. BrownEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1825)


In order to execute more advanced computational chemogenomic workflows, it is essential to understand the basic data formats and options for processing them. In this chapter, de facto standards for compound and protein representation are explained, with procedures for processing them given. A walkthrough demonstrates the step-by-step processes of downloading a ligand–target database, parsing the bioactivity in the database, automatically retrieving its chemical structures and protein sequences from a command line, and finally converting the structures and sequences into representative machine-ready formats. A basic protocol to visualize the parsed database and look for patterns is also given.

Key words

Chemical data structure Protein data structure Molecular data processing tools Database retrieval Compound–protein visualization 



The author would like to thank Dr. Christin Rakers of Nagoya University for critical reading and suggestions for improvement of the manuscript.


  1. 1.
    Caron PR, Mullican MD, Mashal RD et al (2001) Chemogenomic approaches to drug discovery. Curr Opin Chem Biol 5:464–470CrossRefGoogle Scholar
  2. 2.
    Bredel M, Jacoby E (2004) Chemogenomics: an emerging strategy for rapid target and drug discovery. Nat Rev Genet 5:262–275. Scholar
  3. 3.
    Bleicher KH (2002) Chemogenomics: bridging a drug discovery gap. Curr Med Chem 9:2077–2084. Scholar
  4. 4.
    Bunin BA, Siesel A, Morales GA, Bajorath J (2007) Chemoinformatics: theory, practice, & products. Springer, Dordrecht. Scholar
  5. 5.
    Gasteiger J (2008) Handbook of chemoinformatics. Springer, Dordrecht. Scholar
  6. 6.
    Gasteiger J, Engel T (2003) Chemoinformatics: a textbook. Springer, Dordrecht. Scholar
  7. 7.
    Leach AR, Gillet VJ (2007) An introduction to chemoinformatics. Springer, Dordrecht. Scholar
  8. 8.
    Todeschini R, Consonni V (2010) Molecular descriptors for chemoinformatics. Springer, Dordrecht. Scholar
  9. 9.
    Chen YPP (2005) Bioinformatics technologies. Springer, Dordrecht. Scholar
  10. 10.
    Van der Auwera GA, Carneiro MO, Hartl C et al (2002) Current protocols in bioinformatics. Springer, Dordrecht. Scholar
  11. 11.
    Zhang YQ, Rajapakse JC (2008) Machine learning in bioinformatics. Springer, Dordrecht. Scholar
  12. 12.
    Kinser J (2008) Python for bioinformatics. Springer, Dordrecht. Scholar
  13. 13.
    Polanski A, Kimmel M (2007) Bioinformatics. Springer, Dordrecht. Scholar
  14. 14.
    Xiong J (2006) Essential bioinformatics. Springer, Dordrecht. Scholar
  15. 15.
    Jones NC, P a P (2004) An introduction to bioinformatics algorithms. Springer, Dordrecht. Scholar
  16. 16.
    Heath LS, Ramakrishnan N (2011) Problem solving handbook in computational biology and bioinformatics. Springer, Dordrecht. Scholar
  17. 17.
    Dougherty D, O’Reilly T (1988) Unix text processing: ISBN-10: 0672462915, ISBN-13: 978-0672462917Google Scholar
  18. 18.
    Levine JR, Young ML (2004) UNIX for Dummies: ISBN-10 0764541471, ISBN-13 9780764541476Google Scholar
  19. 19.
    Burtch KO (2004) Linux shell scripting with Bash. Book. doi: Scholar
  20. 20.
    Barrett DJ (2012) Linux pocket guide. Linux. doi:
  21. 21.
    Robbins A (2013) Unix in a nutshell. FEBS J. doi:
  22. 22.
    Stewart JM (2014) Python for scientists. Python Sci. doi:
  23. 23.
    Lutz M (2007) Learning python. Icarus. doi: Scholar
  24. 24.
    Summerfield M (2010) Programming in Python 3. Text. doi: 9788441526136Google Scholar
  25. 25.
    Bento AP, Gaulton A, Hersey A et al (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42:D1083–D1090. Scholar
  26. 26.
    Gaulton A, Hersey A, Nowotka M et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45:D945–D954. Scholar
  27. 27.
    Lipinski C, Hopkins A (2004) Navigating chemical space for biology and medicine. Nature 432:855–861CrossRefGoogle Scholar
  28. 28.
    Besnard J, Ruda GF, Setola V et al (2012) Automated design of ligands to polypharmacological profiles. Nature 492:215–220CrossRefGoogle Scholar
  29. 29.
    Hopkins AL (2007) Network pharmacology. Nat Biotechnol 25:1110–1111CrossRefGoogle Scholar
  30. 30.
    Hu Y, Bajorath J (2015) Exploring the scaffold universe of kinase inhibitors. J Med Chem 58:315–332. Scholar
  31. 31.
    Zhang J, Yang PL, Gray NS (2009) Targeting cancer with small molecule kinase inhibitors. Nat Rev Cancer 9:28–39. Scholar
  32. 32.
    Lahiry P, Torkamani A, Schork NJ, Hegele RA (2010) Kinase mutations in human disease: interpreting genotype-phenotype relationships. Nat Rev Genet 11:60–74. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Life Science Informatics Research Unit, Laboratory of Molecular BiosciencesKyoto University Graduate School of MedicineKyotoJapan

Personalised recommendations