Abstract
Determining the net charge and protonation states populated by a small molecule in an environment of interest or the cost of altering those protonation states upon transfer to another environment is a prerequisite for predicting its physicochemical and pharmaceutical properties. The environment of interest can be aqueous, an organic solvent, a protein binding site, or a lipid bilayer. Predicting the protonation state of a small molecule is essential to predicting its interactions with biological macromolecules using computational models. Incorrectly modeling the dominant protonation state, shifts in dominant protonation state, or the population of significant mixtures of protonation states can lead to large modeling errors that degrade the accuracy of physical modeling. Low accuracy hinders the use of physical modeling approaches for molecular design. For small molecules, the acid dissociation constant (pKa) is the primary quantity needed to determine the ionic states populated by a molecule in an aqueous solution at a given pH. As a part of SAMPL6 community challenge, we organized a blind pKa prediction component to assess the accuracy with which contemporary pKa prediction methods can predict this quantity, with the ultimate aim of assessing the expected impact on modeling errors this would induce. While a multitude of approaches for predicting pKa values currently exist, predicting the pKas of drug-like molecules can be difficult due to challenging properties such as multiple titratable sites, heterocycles, and tautomerization. For this challenge, we focused on set of 24 small molecules selected to resemble selective kinase inhibitors—an important class of therapeutics replete with titratable moieties. Using a Sirius T3 instrument that performs automated acid–base titrations, we used UV absorbance-based pKa measurements to construct a high-quality experimental reference dataset of macroscopic pKas for the evaluation of computational pKa prediction methodologies that was utilized in the SAMPL6 pKa challenge. For several compounds in which the microscopic protonation states associated with macroscopic pKas were ambiguous, we performed follow-up NMR experiments to disambiguate the microstates involved in the transition. This dataset provides a useful standard benchmark dataset for the evaluation of pKa prediction methodologies on kinase inhibitor-like compounds.
Similar content being viewed by others
Abbreviations
- SAMPL:
-
Statistical Assessment of the Modeling of Proteins and Ligands
- pK a :
-
\(-{\log _{10}}\) acid dissociation equilibrium constant
- ps K a :
-
\(-{\log _{10}}\) apparent acid dissociation equilibrium constant in the presence of cosolvent
- DMSO:
-
Dimethyl sulfoxide
- ISA:
-
Ionic-strength adjusted
- SEM:
-
Standard error of the mean
- TFA:
-
Target factor analysis
- LC–MS:
-
Liquid chromatography–mass spectrometry
- NMR:
-
Nuclear magnetic resonance spectroscopy
- HMBC:
-
Heteronuclear multiple-bond correlation
- TFA-d :
-
Deutero-trifluoroacetic acid
References
Mobley DL, Chodera JD, Isaacs L, Gibb BC (2016) Advancing predictive modeling through focused development of model systems to drive new modeling innovations. UC Irvine: Department of Pharmaceutical Sciences, UCI. https://escholarship.org/uc/item/7cf8c6cr. Accessed 16 May 2018
Drug Design Data Resource, SAMPL. https://drugdesigndata.org/about/sampl. Accessed 16 May 2018
Nicholls A, Mobley DL, Guthrie JP, Chodera JD, Bayly CI, Cooper MD, Pande VS (2008) Predicting small-molecule solvation free energies: an informal blind test for computational chemistry. J Med Chem 51(4):769–779. https://doi.org/10.1021/jm070549+
Guthrie JP (2009) A blind challenge for computational solvation free energies: introduction and overview. J Phys Chem B 113(14):4501–4507
Skillman AG, Geballe MT, Nicholls A (2010) SAMPL2 challenge: prediction of solvation energies and tautomer ratios. J Comput Aided Mol Des 24(4):257–258. https://doi.org/10.1007/s10822-010-9358-0
Geballe MT, Skillman AG, Nicholls A, Guthrie JP, Taylor PJ (2010) The SAMPL2 blind prediction challenge: introduction and overview. J Comput Aided Mol Des. 24(4):259–279. https://doi.org/10.1007/s10822-010-9350-8
Skillman AG (2012) SAMPL3: blinded prediction of host–guest binding affinities, hydration free energies, and trypsin inhibitors. J Comput Aided Mol Des. 26(5):473–474. https://doi.org/10.1007/s10822-012-9580-z
Geballe MT, Guthrie JP (2012) The SAMPL3 blind prediction challenge: transfer energy overview. J Comput Aided Mol Des 26(5):489–496. https://doi.org/10.1007/s10822-012-9568-8
Muddana HS, Varnado CD, Bielawski CW, Urbach AR, Isaacs L, Geballe MT, Gilson MK (2012) Blind prediction of host–guest binding affinities: a new SAMPL3 challenge. J Comput Aided Mol Des 26(5):475–487. https://doi.org/10.1007/s10822-012-9554-1
Guthrie JP (2014) SAMPL4, a blind challenge for computational solvation free energies: the compounds considered. J Comput Aided Mol Des 28(3):151–168. https://doi.org/10.1007/s10822-014-9738-y
Mobley DL, Wymer KL, Lim NM, Guthrie JP (2014) Blind prediction of solvation free energies from the SAMPL4 challenge. J Comput Aided Mol Des 28(3):135–150. https://doi.org/10.1007/s10822-014-9718-2
Muddana HS, Fenley AT, Mobley DL, Gilson MK (2014) The SAMPL4 host–guest blind prediction challenge: an overview. J Comput Aided Mol Des 28(4):305–317. https://doi.org/10.1007/s10822-014-9735-1
Mobley DL, Liu S, Lim NM, Wymer KL, Perryman AL, Forli S, Deng N, Su J, Branson K, Olson AJ (2014) Blind prediction of HIV integrase binding from the SAMPL4 challenge. J Comput Aided Mol Des 28(4):327–345. https://doi.org/10.1007/s10822-014-9723-5
Yin J, Henriksen NM, Slochower DR, Shirts MR, Chiu MW, Mobley DL, Gilson MK (2017) Overview of the SAMPL5 host–guest challenge: are we doing better? J Comput Aided Mol Des 31(1):1–19. https://doi.org/10.1007/s10822-016-9974-4
Bannan CC, Burley KH, Chiu M, Shirts MR, Gilson MK, Mobley DL (2016) Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge. J Comput Aided Mol Des 30(11):1–18. https://doi.org/10.1007/s10822-016-9954-8
Bannan CC, Burley KH, Chiu M, Shirts MR, Gilson MK, Mobley DL (2016) Blind prediction of cyclohexane-water distribution coefficients from the SAMPL5 challenge. J Comput-Aided Mol Des 30(11):927–944. https://doi.org/10.1007/s10822-016-9954-8
Rustenburg AS, Dancer J, Lin B, Feng JA, Ortwine DF, Mobley DL, Chodera JD (2016) Measuring experimental cyclohexane–water distribution coefficients for the SAMPL5 challenge. J Comput-Aided Mol Des 30(11):945–958. https://doi.org/10.1007/s10822-016-9971-7
Pickard FC, König G, Tofoleanu F, Lee J, Simmonett AC, Shao Y, Ponder JW, Brooks BR (2016) Blind prediction of distribution in the SAMPL5 challenge with QM based protomer and pK a corrections. J Comput-Aided Mol Des 30(11):1087–1100. https://doi.org/10.1007/s10822-016-9955-7
Bodner GM (1986) Assigning the pKa’s of polyprotic acids. J Chem Educ 63(3):246
Darvey IG (1995) The assignment of pKa values to functional groups in amino acids. Wiley, New York
Bezençon J, Wittwer MB, Cutting B, Smieško M, Wagner B, Kansy M, Ernst B (2014) pKa determination by 1H NMR spectroscopy–an old methodology revisited. J Pharm Biomed Anal 93:147–155. https://doi.org/10.1016/j.jpba.2013.12.014
Elson EL, Edsall JT (1962) Raman spectra and sulfhydryl ionization constants of thioglycolic acid and cysteine. Biochemistry 1(1):1–7
Elbagerma MA, Edwards HGM, Azimi G, Scowen IJ (2011) Raman spectroscopic determination of the acidity constants of salicylaldoxime in aqueous solution. J Raman Spectrosc 42(3):505–511. https://doi.org/10.1002/jrs.2716
Rupp M, Korner R, V Tetko I (2011) Predicting the pKa of small molecules. Comb Chem High Throughput Screen 14(5):307–327
Marosi A, Kovács Z, Béni S, Kökösi J, Noszál B (2009) Triprotic acid–base microequilibria and pharmacokinetic sequelae of cetirizine. Eur J Pharm Sci 37(3–4):321–328. https://doi.org/10.1016/j.ejps.2009.03.001
Sober HA, Company CR (1970) Handbook of biochemistry: selected data for molecular biology. Chemical Rubber Company, Cleveland
Benesch RE, Benesch R (1955) The acid strength of the -SH group in cysteine and related compounds. J Am Chem Soc 77(22):5877–5881. https://doi.org/10.1021/ja01627a030
Tam KY, Takács-Novák K (2001) Multi-wavelength spectrophotometric determination of acid dissociation constants: a validation study. Anal Chim Acta 434(1):157–167
Allen RI, Box KJ, Comer JEA, Peake C, Tam KY (1998) Multiwavelength spectrophotometric determination of acid dissociation constants of ionizable drugs. J Pharm Biomed Anal 17(4):699–712
Comer JEA, Manallack D (2014) Ionization constants and ionization profiles. In: Reedijk J (ed) Reference module in chemistry, molecular sciences and chemical engineering. Elsevier, New York. https://doi.org/10.1016/B978-0-12-409547-2.11233-8
Avdeef A, Box KJ, Comer JEA, Gilges M, Hadley M, Hibbert C, Patterson W, Tam KY (1999) PH-metric logP 11. pK a determination of water-insoluble drugs in organic solvent–water mixtures. J Pharm Biomed Anal 20(4):631–641
Cabot JM, Fuguet E, Rosés M, Smejkal P, Breadmore MC (2015) Novel instrument for automated pKa determination by internal standard capillary electrophoresis. Anal Chem 87(12):6165–6172. https://doi.org/10.1021/acs.analchem.5b00845
Wan H, Holmén A, Någård M, Lindberg W (2002) Rapid screening of pKa values of pharmaceuticals by pressure-assisted capillary electrophoresis combined with short-end injection. J Chromatogr A 979(1–2):369–377
Reijenga J, van Hoof A, van Loon A, Teunissen B (2013) Development of methods for the determination of pKa values. Anal Chem Insights 8:ACI.S12304. https://doi.org/10.4137/ACI.S12304
Sterling T, Irwin JJ (2015) ZINC 15 - ligand discovery for everyone. J Chem Inf Model 55(11):2324–2337. https://doi.org/10.1021/acs.jcim.5b00559
Baell JB, Holloway GA (2010) New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J Med Chem 53(7):2719–2740. https://doi.org/10.1021/jm901137j
Saubern S, Guha R, Baell JB (2011) KNIME workflow to assess PAINS filters in SMARTS format. Comparison of RDKit and Indigo Cheminformatics Libraries. Mol Inf 30(10):847–850. https://doi.org/10.1002/minf.201100076
eMolecules Database Free Version. https://www.emolecules.com/info/products-data-downloads.html. Accessed 01 July 2017
OEChem Toolkit Version 2017.Feb.1;. OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com
Shelley JC, Cholleti A, Frye LL, Greenwood JR, Timlin MR, Uchimaya M (2007) Epik: a software program for pK a prediction and protonation state generation for drug-like molecules. J Comput-Aided Mol Des 21(12):681–691. https://doi.org/10.1007/s10822-007-9133-z
Schrödinger Release 2016-4: Epik Version 3.8;. Schrödinger, LLC, New York, 2016
OEMolProp Toolkit Version 2017.Feb.1;. OpenEye Scientific Software, Santa Fe, NM. http://www.eyesopen.com
Wishart DS (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34(90001):D668–D672. https://doi.org/10.1093/nar/gkj067
Pence HE, Williams A (2010) ChemSpider: an online chemical information resource. J Chem Educ 87(11):1123–1124. https://doi.org/10.1021/ed100697w
NCI Open Database, August 2006 Release. https://cactus.nci.nih.gov/download/nci/. Accessed 8 Aug 2017
Enhanced NCI Database Browser 2.2. https://cactus.nci.nih.gov/ncidb2.2/. Accessed 8 Aug 2017
Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem substance and compound databases. Nucleic Acids Res 44(D1):D1202–D1213. https://doi.org/10.1093/nar/gkv951
NCI/CADD Chemical Identifier Resolver. https://cactus.nci.nih.gov/chemical/structure. Accessed 8 Aug 2017
Bemis GW, Murcko MA (1996) The properties of known drugs. 1. Molecular frameworks. J Med Chem 39(15):2887–2893
OEMedChem Toolkit Version 2017.Feb.1;. OpenEye Scientific Software, Santa Fe. http://www.eyesopen.com
Sirius T3 User Manual, v1.1. Sirius Analytical Instruments Ltd, East Sussex (2008)
Yasuda M (1959) Dissociation constants of some carboxylic acids in mixed aqueous solvents. Bull Chem Soc Japan 32(5):429–432
Shedlovsky T (1962) The behaviour of carboxylic acids in mixed solvents. In: Pesce B (ed) Electrolytes. Pergamon Press, New York, pp 146–151
Avdeef A, Comer JEA, Thomson SJ (1993) pH-Metric log P. 3. Glass electrode calibration in methanol-water, applied to pKa determination of water-insoluble substances. Anal Chem 65(1):42–49. https://doi.org/10.1021/ac00049a010
Takács-Novák K, Box KJ, Avdeef A (1997) Potentiometric pKa determination of water-insoluble compounds: validation study in methanol/water mixtures. Int J Pharm 151(2):235–248. https://doi.org/10.1016/S0378-5173(97)04907-7
Szakacs Z, Beni S, Varga Z, Orfi L, Keri G, Noszal B (2005) Acid–base profiling of imatinib (gleevec) and its fragments. J Med Chem 48(1):249–255. https://doi.org/10.1021/jm049546c
Szakacs Z, Kraszni M, Noszal B (2004) Determination of microscopic acid–base parameters from NMR–pH titrations. Anal Bioanal Chem 378(6):1428–1448. https://doi.org/10.1007/s00216-003-2390-3
Dozol H, Blum-Held C, Guédat P, Maechling C, Lanners S, Schlewer G, Spiess B (2002) Inframolecular acid–base studies of the tris and tetrakis myo-inositol phosphates including the 1, 2, 3-trisphosphate motif. J Mol Struct 643(1–3):171–181
OEDepict Toolkit Version 2017.Feb.1;. OpenEye Scientific Software, Santa Fe. http://www.eyesopen.com
Fraczkiewicz R (2013) In silico prediction of ionization. In: Reedijk J (ed) Reference module in chemistry, molecular sciences and chemical engineering. Elsevier, New York. https://doi.org/10.1016/B978-0-12-409547-2.02610-X
Acknowledgements
MI, ASR, and JDC acknowledge support from the Sloan Kettering Institute. JDC acknowledges support from NIH grant P30 CA008748. MI, JDC, ASR, and DLM gratefully acknowledge support from NIH grant R01GM124270 supporting SAMPL blind challenges. MI acknowledges support from a Doris J. Hutchinson Fellowship. DLM appreciates financial support from the National Institutes of Health (1R01GM108889-01), the National Science Foundation (CHE 1352608). IEN acknowledges support from the MRL Postdoctoral Research Program. The authors are extremely grateful for the assistance and support from the MRL Preformulations and NMR Structure Elucidation groups for materials, expertise, and instrument time, without which this SAMPL challenge would not have been possible. MI and DL are grateful to Pion/Sirius Analytical for their technical support in the planning and execution of this study. We are especially thankful to Karl Box (Sirius Analytical) for the guidance on optimization and interpretation of pKa measurements with the Sirius T3, as well as feedback on the manuscript. We thank Brad Sherborne (MRL; ORCID: 0000-0002-0037-3427) for his valuable insights at the conception of the pKa challenge and connecting us with TR and DL who were able to provide resources for experimental measurements. We acknowledge Paul Czodrowski (Merck KGaA; ORCID: 0000-0002-7390-8795) who provided feedback on multiple stages of this work: challenge construction, purchasable compound selection, and manuscript. We acknowledge contributions from Caitlin Bannan who provided feedback on experimental data collection and structure of pKa challenge from a computational chemist’s perspective. We are also grateful to Marilyn Gunner (CCNY) for her feedback on this manuscript. We thank anonymous reviewers for their input and constructive comments that improved this manuscript. MI, ASR, and JDC are grateful to OpenEye Scientific for providing a free academic software license for use in this work. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Contributions
Conceptualization, MI, JDC, TR, ASR, DLM; Methodology, MI, DL, IEN; Software, MI, ASR; Formal Analysis, MI; Investigation, MI, DL, IEN, HW, XW, MR; Resources, TR, DL; Data Curation, MI; Writing-Original Draft, MI, JDC, IEN; Writing - Review and Editing, MI, DL, ASR, IEN, HW, XW, MR, GEM, DLM, TR, JDC; Visualization, MI, IEN; Supervision, JDC, TR, DLM, GEM, AAM; Project Administration, MI; Funding Acquisition, JDC, DLM, TR, MI.
Corresponding authors
Ethics declarations
Conflict of interest
JDC was a member of the Scientific Advisory Board for Schrödinger, LLC during part of this study. JDC and DLM are current members of the Scientific Advisory Board of OpenEye Scientific Software. The Chodera laboratory receives or has received funding from multiple sources, including the National Institutes of Health, the National Science Foundation, the Parker Institute for Cancer Immunotherapy, Relay Therapeutics, Entasis Therapeutics, Silicon Therapeutics, EMD Serono (Merck KGaA), AstraZeneca, the Molecular Sciences Software Institute, the Starr Cancer Consortium, Cycle for Survival, a Louis V. Gerstner Young Investigator Award, and the Sloan Kettering Institute. A complete list of funding can be found at http://choderalab.org/funding.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Işık, M., Levorse, D., Rustenburg, A.S. et al. pKa measurements for the SAMPL6 prediction challenge for a set of kinase inhibitor-like fragments. J Comput Aided Mol Des 32, 1117–1138 (2018). https://doi.org/10.1007/s10822-018-0168-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10822-018-0168-0