Abstract
We used the freely available Chemistry Development Kit (CDK) fingerprint to classify 5235 representative molecules taken from ten banned classes in the 2005 World Anti-Doping Agency’s (WADA) prohibited list, including molecules taken from the corresponding activity classes in the MDL Drug Data Report (MDDR). We used both Random Forest and k-Nearest Neighbours (kNN) algorithms to generate classifiers. The kNN classifiers withk = 1 gave a very slightly better Matthews Correlation Coefficient than the Random Forest classifiers; the latter, however, predicted fewer false positives. The performance of kNN classifiers tended to decline with increasing k. The performance of the CDK fingerprint is essentially equivalent to that of Unity 2D. Our results suggest that it will be possible to use freely available chemoinformatics tools to aid the fight against drugs in sport, while minimising the risk of wrongfully penalising innocent athletes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
World Anti-Doping Agency (WADA), Stock Exchange Tower, 800 Place Victoria (Suite 1700), P.O. Box 120, Montreal, Quebec, H4Z 1B7, Canada, http://www.wada-ama.org/
Handelsman, D.J.: Designer Androgens in Sport: When too Much is Never Enough. Sci. STKE (244), 41 (2004)
Death, A.K., McGrath, K.C.Y., Kazlauskas, R., Handelsman, D.J.: Tetrahydrogestrinone is a Potent Androgen and Progestin. J. Clin. Endocrinol. Metab. 89, 2498–2500 (2004)
Kontaxakis, S.G., Christodoulou, M.A.: A Neural Network System for Doping Detection in Athletes. In: Proceedings 4th International Conference on Technology and Automation, Thessaloniki, Greece (October 2002)
Cannon, E.O., Bender, A., Palmer, D.S., Mitchell, J.B.O.: Chemoinformatics-based Classification of Prohibited Substances Employed for Doping in Sport. J. Chem. Inf. Model (submitted)
Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., Willighagen, E.: The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003)
Tripos Inc., 1699 South Hanley Road, St. Louis, MO 63144-2319, USA, http://www.tripos.com
Elsevier MDL, 2440 Camino Ramon, San Ramon, CA 94583, USA, http://www.mdli.com
Daylight Chemical Information Systems, Inc. 120 Vantis - Suite 550 - Aliso Viejo, CA 92656, USA, http://www.daylight.com/
Wild, D., Blankley, C.J.: Comparison of 2D Fingerprint Types and Hierarchy Level Selection Methods for Structural Grouping Using Ward’s Clustering. J. Chem. Inf. Comput. Sci. 40, 155–162 (2000)
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2005), http://www.R-project.org ISBN 3-900051-07-0
Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001)
Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A.F., Nielsen, H.: Assessing the Accuracy of Prediction Algorithms for Classification: An Overview. Bioinformatics 16, 412–424 (2000)
Lam, L., Suen, C.Y.: Application of Majority Voting to Pattern Recognition: An Analysis of its Behavior and Performance. IEEE Trans. Systems, Man and Cybernetics 27, 553–567 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cannon, E.O., Mitchell, J.B.O. (2006). Classifying the World Anti-Doping Agency’s 2005 Prohibited List Using the Chemistry Development Kit Fingerprint. In: R. Berthold, M., Glen, R.C., Fischer, I. (eds) Computational Life Sciences II. CompLife 2006. Lecture Notes in Computer Science(), vol 4216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875741_17
Download citation
DOI: https://doi.org/10.1007/11875741_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45767-1
Online ISBN: 978-3-540-45768-8
eBook Packages: Computer ScienceComputer Science (R0)