Advertisement

Parsing Compound–Protein Bioactivity Tables

  • J. B. BrownEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1825)

Abstract

With the availability of a multitude of databases that contain information on the bioactivity between compounds and proteins, several fundamental tasks arise. These include parsing of the original data in order to filter out unusable data, merging of multiple databases, identification of the sets of unique molecules, and selection of subsets of parsed data.

In this chapter, we address these issues by providing solutions to each of the problems. Solutions are presented using standardized and freely available data processing tools, as well as computer program code.

Key words

Bioactivity database Data management Chemogenomic data Visualization Compound–protein dataset 

References

  1. 1.
    Gaulton A, Hersey A, Nowotka M et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45:D945–D954.  https://doi.org/10.1093/nar/gkw1074CrossRefGoogle Scholar
  2. 2.
    Kim S, Thiessen PA, Bolton EE et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44:D1202–D1213.  https://doi.org/10.1093/nar/gkv951CrossRefGoogle Scholar
  3. 3.
    Wang Y, Bryant SH, Cheng T et al (2017) PubChem BioAssay: 2017 update. Nucleic Acids Res 45:D955–D963.  https://doi.org/10.1093/nar/gkw1118CrossRefGoogle Scholar
  4. 4.
    Chan WKB, Zhang H, Yang J et al (2015) GLASS: a comprehensive database for experimentally-validated GPCR-ligand associations. Bioinformatics 31:btv302.  https://doi.org/10.1093/bioinformatics/btv302CrossRefGoogle Scholar
  5. 5.
    Roth BL, Lopez E, Patel S, Kroeze WK (2000) The multiplicity of serotonin receptors: uselessly diverse molecules or an embarrassment of riches? Neuroscience 6:252–262.  https://doi.org/10.1177/107385840000600408CrossRefGoogle Scholar
  6. 6.
    Hewett M, Oliver DE, Rubin DL et al (2002) PharmGKB: the pharmacogenetics knowledge base. Nucleic Acids Res 30:163–165CrossRefGoogle Scholar
  7. 7.
    Szklarczyk D, Santos A, von Mering C et al (2015) STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res 44:gkv1277.  https://doi.org/10.1093/nar/gkv1277CrossRefGoogle Scholar
  8. 8.
    Kuhn M, Szklarczyk D, Pletscher-Frankild S et al (2014) STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res 42:D401–D407.  https://doi.org/10.1093/nar/gkt1207CrossRefGoogle Scholar
  9. 9.
    Tanabe M, Kanehisa M (2012) Using the KEGG database resource. Curr Protoc Bioinformatics.  https://doi.org/10.1002/0471250953.bi0112s38
  10. 10.
    Kanehisa M, Sato Y, Kawashima M et al (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res 44:D457–D462.  https://doi.org/10.1093/nar/gkv1070CrossRefGoogle Scholar
  11. 11.
    Fabregat A, Sidiropoulos K, Garapati P et al (2016) The reactome pathway knowledgebase. Nucleic Acids Res 44:D481–D487.  https://doi.org/10.1093/nar/gkv1351CrossRefPubMedGoogle Scholar
  12. 12.
    Joshi-Tope G, Gillespie M, Vastrik I et al (2005) Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 33(Database issue):D428–D432.  https://doi.org/10.1093/nar/gki072CrossRefPubMedGoogle Scholar
  13. 13.
    Shinbo Y, Nakamura Y, Altaf-Ul-Amin M et al (2006) KNApSAcK: a comprehensive species-metabolite relationship database. In: Plant metabolomics. Biotechnology in agriculture and forestry. Springer, Berlin, Heidelberg, pp 165–181Google Scholar
  14. 14.
    Nakamura K, Shimura N, Otabe Y et al (2013) KNApSAcK-3D: a three-dimensional structure database of plant metabolites. Plant Cell Physiol 54(2):e4.  https://doi.org/10.1093/pcp/pcs186CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Life Science Informatics Research Unit, Laboratory of Molecular BiosciencesKyoto University Graduate School of MedicineKyotoJapan

Personalised recommendations