Accessing the High-Throughput Screening Data Landscape
The progress of high-throughput screening (HTS) techniques is changing the chemical data landscape by producing massive biological data from tested compounds. Public data repositories (e.g., PubChem) receive HTS data provided by various institutes and this data pool is being updated on a daily basis. The goal of these data sharing efforts is to let users quickly obtain the biological data of target compounds. Without a universal chemical identifier, the repositories (e.g., PubChem) provide users various methods to query and retrieve chemical properties and biological data by several different chemical identifiers (e.g., SMILES, InChIKey, and IUPAC name). The major challenge for most users, especially computational modelers, is obtaining the biological data for a large dataset of compounds (e.g., thousands of drug molecules) instead of a single compound. This chapter aims to introduce the steps to access the public data repositories for target compounds with specific emphasis on the automatic data downloading for large datasets.
Key wordsCompounds Chemical identifier Biological data PubChem
- 1.Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, et al. (2015) PubChem Substance and Compound databases. Nucleic Acids Res 44:D1202–D1213.Google Scholar