A Similarity-Based Grouping Method for Molecular Docking in Distributed System

  • Ruisheng Zhang
  • Guangcai Liu
  • Rongjing Hu
  • Jiaxuan Wei
  • Juan Li
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8346)


Molecular docking is one main technique in Virtual Screening. During a molecular docking process, the molecule docking time presents serious diversity because of different chemical structures. The time diversity can cause certain nodes to overload, thereby reducing the data processing ability of the whole distributed molecular docking system. Therefore, a reasonable and efficient data grouping strategy is essential in the molecular docking system. In this paper, molecular structural similarity is researched in depth, and a similarity-based data grouping method is proposed. On the basis of the work in Database Management System for Virtual Screening, the method takes advantage of the computational chemistry software Chemistry Development Kit and cluster analysis methods to process the chemical molecules data. Finally, we deploy and implement the data grouping method on the Hadoop distributed platform. The experimental results show that this data grouping method can improve the efficiency of molecular docking.


Molecular Docking Virtual Screening Distributed System Hadoop Platform 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Mclnnes, C.: Virtual screening strategies in drug discovery. Current Opinion in Chemical Biology 11, 494–502 (2007)CrossRefGoogle Scholar
  2. 2.
    Conrad, M.: Molecular computing: the lock-key paradigm. Computer 25(11), 11–20 (1992)CrossRefGoogle Scholar
  3. 3.
    Beynon, M.D., Kurc, T., Catalyurek, U., Chang, C., Sussman, A., Saltz, J.: Distributed processing of very large datasets with DataCutter. Parallel Computing 27(11), 1457–1478 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Yi, Z.: The Rethinking of the Competitive Strategy Based on the Cannikin Law. Journal of Ningbo Institute of Education 2, 029 (2011)Google Scholar
  5. 5.
    Khetan, A., Vivek, B., Gupta, S.C.: A Novel Survey on Load Balancing in Cloud Computing. International Journal of Engineering 2(2) (2013)Google Scholar
  6. 6.
    Jingwei, L., Rongjing, H., Ruisheng, Z., Jiuqiang, C., Guangcai, L.: An Effective Data Management Solution for Distributed Virtual Screening. In: The 2012 IET International Conference on Frotier Computin., pp. 280–285 (2012)Google Scholar
  7. 7.
    Maldonado, A.G., Doucet, J.P., Petitjean, M., Fan, B.T.: Molecular similarity and diversity in chemoinformatics from theory to applications. Molecular Diversity 10(1), 39–79 (2006)CrossRefGoogle Scholar
  8. 8.
    Johnson, M.A., Gerald, M.: Maggiora: Concepts and applications of molecular similarity, vol. 8. Wiley, New York (1990)Google Scholar
  9. 9.
    Daylight Chemical Information Systems Int.,
  10. 10.
    Barnard Chemical Information Ltd.,
  11. 11.
  12. 12.
    White, T.: Hadoop: The definitive guide. O’Reilly Media, Inc. (2012)Google Scholar
  13. 13.
    ZINC- A free database for virtural screening,
  14. 14.
  15. 15.
    Taylor, R.C.: An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC Bioinformatics 11(suppl. 12) (2010)Google Scholar
  16. 16.
    Ellingson, S.R., Jerome, B.: High-throughput virtual molecular docking: Hadoop implementation of AutoDock4 on a private cloud. In: Proceedings of the Second International Workshop on Emerging Computational Methods for the life Sciences. ACM (2011)Google Scholar
  17. 17.
    Holliday, J.D., Hu, C.Y., Peter, W.: Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry & High Throughput Screening 5(2), 155–166 (2002)CrossRefGoogle Scholar
  18. 18.
    Steinbeck, C., Hoppe, C., Kuhn, S., Floris, M., Guha, R., Willighagen, E.L.: Recent developments of the chemistry development kit (CDK) – an open-source Java library for chemo- and bioinformatics. Curr. Pharm. Des. 12(17), 2111–2120 (2006)CrossRefGoogle Scholar
  19. 19.
    Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttman, E., Willighagen, E.: The Chemistry Development Kit (CDK): an open-source Java library for Chemo-and Bioinformatics. J. Chem. Inf. Comput. Sci. 43(2), 493–500 (2003)CrossRefGoogle Scholar
  20. 20.
    Borthakur, D.: HDFS architecture guide. Hadoop Apache Project,
  21. 21.
    Chen, X., Frank, K.B.: Asymmetry of chemical similarity. Chem. Med. Chem. 2(2), 180–182 (2007)CrossRefGoogle Scholar
  22. 22.
    Kaufman, L., Peter, J.R.: Finding groups in data: an introduction to cluster analysis, vol. 344. Wiley-Interscience (2009)Google Scholar
  23. 23.
    Hai, M., Zhang, S., Zhu, L., Wang, Y.: A Survey of Distributed Clustering Algorithms. In: 2012 International Conference on Industrial Control and Electronics Engineering (ICICEE), pp. 1142–1145. IEEE (2012)Google Scholar
  24. 24.
    Yuan, D., et al.: A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurrency and Computation: Practice and Experience 24(9), 956–976 (2012)CrossRefGoogle Scholar
  25. 25.
    Ping, S.H.E.N.: The Research on Mining High Dimensional Data. Computer Knowledge and Technology 6, 011 (2009)Google Scholar
  26. 26.
    Zhou, T., Caflisch, A.: Data management system for distributed virtual screening. Journal of Chemical Information and Modeling 49(1), 145–152 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ruisheng Zhang
    • 1
  • Guangcai Liu
    • 1
  • Rongjing Hu
    • 1
  • Jiaxuan Wei
    • 1
  • Juan Li
    • 1
  1. 1.School of Information Science and EngineeringLanzhou UniversityLanzhouChina

Personalised recommendations