SAR Matrix Method for Large-Scale Analysis of Compound Structure–Activity Relationships and Exploration of Multitarget Activity Spaces

  • Ye Hu
  • Jürgen BajorathEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1825)


As the number of compounds and the volume of bioactivity data rapidly grow, advanced computational methods are required to study structure–activity relationships (SARs) on a large scale. Herein, the SAR matrix (SARM) methodology is described that was designed to systematically extract structural relationships between bioactive compounds from large databases, explore structure–activity relationships, and navigate multitarget activity spaces, which is one of the core tasks in chemogenomics. In addition, the SARM approach was designed to visualize structural and structure–activity relationships, which is often of critical importance for making this information available in an intuitive form for practical applications.

Key words

Bioactive compounds Structure–activity relationships (SARs) Multitarget activities Large-scale SAR analysis SAR visualization ChEMBL SAR matrix data structure 



We thank OpenEye Scientific Software, Inc. for a free academic license of the OpenEye Toolkits.


  1. 1.
    Hu Y, Bajorath J (2014) Learning from ‘big data’: compounds and targets. Drug Discov Today 19:357–360CrossRefGoogle Scholar
  2. 2.
    Dossetter AG, Ecker G, Laverty H, Overington J (2014) ‘Big data’ in pharmaceutical science: challenges and opportunities. Future Med Chem 6:857–864CrossRefGoogle Scholar
  3. 3.
    Lusher SJ, McGuire R, van Schaik RC, Nicholson CD, de Vlieg J (2014) Data-driven medicinal chemistry in the era of big data. Drug Discov Today 19:859–868CrossRefGoogle Scholar
  4. 4.
    Richter L, Ecker GF (2015) Medicinal chemistry in the era of big data. Drug Discov Today Technol 14:37–41CrossRefGoogle Scholar
  5. 5.
    Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP (2010) Computational solutions to large-scale data management and analysis. Nat Rev Genet 11:647–657CrossRefGoogle Scholar
  6. 6.
    Jacoby E (2006) Chemogenomics: drug discovery’s panacea? Mol BioSyst 2:218–220CrossRefGoogle Scholar
  7. 7.
    Lu JJ, Pan W, Hu YJ, Wang YT (2012) Multi-target drugs: the trend of drug research and development. PLoS One 7:e40262CrossRefGoogle Scholar
  8. 8.
    Jalencas X, Mestres J (2012) On the origins of drug polypharmacology. Med Chem Commun 4:80–87CrossRefGoogle Scholar
  9. 9.
    Hu Y, Bajorath J (2013) Compound promiscuity—what can we learn from current data. Drug Discov Today 18:644–650CrossRefGoogle Scholar
  10. 10.
    Anighoro A, Bajorath J, Rastelli G (2014) Polypharmacology: challenges and opportunities in drug discovery. J Med Chem 57:7874–7887CrossRefGoogle Scholar
  11. 11.
    Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107CrossRefGoogle Scholar
  12. 12.
    Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M et al (2014) The ChEMBL bioactivity database: an update. Nucleic Acids Res 42:D1083–D1090CrossRefGoogle Scholar
  13. 13.
    Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Zhou Z (2012) PubChem’s BioAssay database. Nucleic Acids Res 40:D400–D412CrossRefGoogle Scholar
  14. 14.
    Hu Y, Bajorath J (2014) Influence of search parameters and criteria on compound selection, promiscuity, and pan assay interference characteristics. J Chem Inf Model 54:3056–3066CrossRefGoogle Scholar
  15. 15.
    Hu Y, Bajorath J (2014) Monitoring drug promiscuity over time. F1000Res 3:218PubMedPubMedCentralGoogle Scholar
  16. 16.
    Hu Y, Jasial S, Bajorath J (2015) Promiscuity progression of bioactive compounds over time. F1000Res 4:118PubMedPubMedCentralGoogle Scholar
  17. 17.
    OEChem, version 1.7.7 (2012) OpenEye Scientific Software, Inc., Santa Fe, NM.
  18. 18.
    Kenny PW, Sadowski J (2004) In: Oprea TI (ed) Chemoinformatics in drug discovery. Wiley-VCH, Weinheim, pp 271–285Google Scholar
  19. 19.
    Hussain J, Rea C (2010) Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 50:339–348CrossRefGoogle Scholar
  20. 20.
    Wassermann AM, Bajorath J (2010) Chemical substitutions that introduce activity cliffs across different compound classes and biological targets. J Chem Inf Model 50:1248–1256CrossRefGoogle Scholar
  21. 21.
    Wawer M, Bajorath J (2011) Local structural changes, global data views: graphical substructure-activity relationship trailing. J Med Chem 54:2944–2951CrossRefGoogle Scholar
  22. 22.
    Wassermann AM, Haebel P, Weskamp N, Bajorath J (2012) SAR matrices: automated extraction of information-rich SAR tables from large compound data sets. J Chem Inf Model 52:1769–1776CrossRefGoogle Scholar
  23. 23.
    Wassermann AM, Bajorath J (2011) A data mining method to facilitate SAR transfer. J Chem Inf Model 51:1857–1866CrossRefGoogle Scholar
  24. 24.
    Gupta-Ostermann D, Hu Y, Bajorath J (2013) Systematic mining of analog series with related core structures in multi-target activity space. J Comput Aided Mol Des 27:665–674CrossRefGoogle Scholar
  25. 25.
    Shanmugasundaram V, Zhang L, Kayastha S, de la Vega de León A, Dimova D, Bajorath J (2016) Monitoring the progression of structure-activity relationship information during lead optimization. J Med Chem 59:4235–4244CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal ChemistryRheinische Friedrich-Wilhelms-UniversitätBonnGermany

Personalised recommendations