Optimized cloud-based scheduling for protein secondary structure analysis

  • Marco Ferretti
  • Luigi SantangeloEmail author
  • Mirto Musci


In the domain of proteomics, an in-depth analysis of the 3D structure of a protein is of paramount importance for many biological studies and applications. At the secondary level, protein structure can be described in terms of motifs, recurrent patterns of smaller biological structures called secondary structure elements. In this paper, the focus is on the identification of geometrical motifs in different proteins using the Cross Motif Search (CMS) algorithm. Such task, due to the high computational cost of CMS with respect to traditional alignment algorithms, is very demanding, and thus, parallel processing is mandatory. In previous papers, CMS parallelization has been already studied from the HPC standpoint. Since cloud computing is emerging as an alternative to on-premise HPC systems, it is worthwhile examining the feasibility and possible advantages in terms of both performance and costs, of migrating to a cloud implementation. This paper is an extension of a preliminary work carried out on the cloud parallelization of CMS. The paper has two main contributions. First of all, an analytic model of the communication pattern of CMS is described, in order to get insights on the performance of the application when executed on a cloud infrastructure. Secondly, an optimized “location-aware” scheduling policy to assign workload to the application workers is introduced, in order to minimize internode communication in a cloud setting. Experiments are presented in order to validate the newly introduced scheduling policy and assess the performance of the cloud implementation of CMS. The results presented in this paper are general, in the sense that they can be applied to any other algorithm with a communication pattern similar to the one of the target applications.


Proteomics Cloud computing HPC Cross Motif Search CINECA Google Cloud pLogP 



  1. 1.
    Ferretti M, Santangelo L (2018) Protein secondary structure analysis in the cloud. In: Vega-Rodrguez MA, Santander-Jimnez S, Granado-Criado JM, Badia RM (eds) Proceedings of the 6th International Workshop on Parallelism in Bioinformatics (PBio 2018). ACM, New York, pp 63–70CrossRefGoogle Scholar
  2. 2.
    Yang H, Tate M (2012) A descriptive literature review and classification of cloud computing research. CAIS 31:2CrossRefGoogle Scholar
  3. 3.
    Mell P, Grance T (2011) The NIST definition of cloud computing. Retrieved from
  4. 4.
    Carlyle G, Harrell SL, Smith PM (2010) Cost-effective HPC: the community or the cloud? In: IEEE 2nd International Conference on Cloud Computing Technology and Science, Indianapolis, IN, 2010, pp 169–176Google Scholar
  5. 5.
    Hassani R, Aiatullah Md, Luksch P (2014) Improving HPC application performance in public cloud. In: IERI Procedia 10:169–176, ISSN 2212-6678Google Scholar
  6. 6.
    Mancini M, Aloisio G (2015) How advanced cloud technologies can impact and change HPC environments for simulation. In: International Conference on High Performance Computing & Simulation (HPCS), Amsterdam, 2015, pp 667–668Google Scholar
  7. 7.
    Yang T, Ma X, Mueller F (2005) Predicting parallel applications performance across platforms using partial execution. In: ACM/IEEE Supercomputing ConferenceGoogle Scholar
  8. 8.
    Chakthranont N, Khunphet P, Takano R, Ikegami T (2014) Exploring the performance impact of virtualization on an HPC cloud. In: IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, pp 426–432Google Scholar
  9. 9.
    Expsito RR, Taboada GL, Ramos S, Tourino J, Doallo R (2013) Performance analysis of HPC applications in the cloud. Fut Gen Comput Syst 29(1):218–229CrossRefGoogle Scholar
  10. 10.
    Ferretti M, Musci M, Santangelo L (2014) A hybrid OpenMP and OpenMPI approach to geometrical motif search in proteins. In: Proceedings of the IEEE International Conference on Cluster Computing (IEEE Cluster 2014), IEEE Computer Society, 2014, pp 298–304Google Scholar
  11. 11.
    Ferretti M, Musci M, Santangelo L (2015) MPI-CMS: a hybrid parallel approach to geometrical motif search in proteins. Concurr Comput Pract Exp 27(18):5500–5516CrossRefGoogle Scholar
  12. 12.
    Ferretti M, Santangelo L (2018) Hybrid OpenMP-MPI parallelism: porting experiments from small to large clusters. In: 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing, PDP 2018, Cambridge, UK, March 21–23, 2018. IEEE Computer Society 2018, pp 297–301Google Scholar
  13. 13.
    Ferretti M, Musci M (2013) Entire motifs search of secondary structures in proteins: a parallelization study. In: Proceedings of the 20th European MPI Users’ Group Meeting. ACMGoogle Scholar
  14. 14.
    Drago G, Ferretti M, Musci M (2013) CCMS: A greedy approach to motif extraction. In: International Conference on Image Analysis and Processing. Springer, BerlinGoogle Scholar
  15. 15.
    Ferretti M, Musci M (2015) Geometrical motifs search in proteins: a parallel approach. Paral Comput 42:60–74CrossRefGoogle Scholar
  16. 16.
    Cantoni V et al (2016) Structural motifs identification and retrieval: a geometrical approach. In: Pattern Recognition in Computational Molecular Biology: Techniques and Approaches. WileyGoogle Scholar
  17. 17.
    Casavant TL, Kuhl JG (1998) A taxonomy of scheduling in general-purpose distributed computing systems. IEEE Trans Soft Eng 14:141–154CrossRefGoogle Scholar
  18. 18.
    Plastino A, Ribeiro CC, Rodriguez NR (2001) Load balancing algorithms for SPMD applications. Retrieved from
  19. 19.
    Osman A, Ammar H (2002) Dynamic load balancing strategies for parallel computers. Sci Ann Cuza Univ 11:110–120Google Scholar
  20. 20.
    Amandeep K, Pawan LM (2018) A review on load balancing in cloud environment. Int J Comput Technol 17(1):7120–7125CrossRefGoogle Scholar
  21. 21.
    Sarood O, Gupta A, Kal LV (2012) Cloud friendly load balancing for hpc applications: Preliminary work. In: 41st International Conference on Parallel Processing Workshops. IEEEGoogle Scholar
  22. 22.
    Rathore J, Keswani B, Rathore VS (2019) Analysis of load balancing algorithms using cloud analyst. In: Rathore V, Worring M, Mishra D, Joshi A, Maheshwari S (eds) Emerging Trends in Expert Applications and Security. Advances in Intelligent Systems and Computing, vol 841. Springer, SingaporeGoogle Scholar
  23. 23.
    Hota A, Mohapatra S, Mohanty S (2019) Survey of different load balancing approach-based algorithms in cloud computing: a comprehensive review. In: Behera H, Nayak J, Naik B, Abraham A (eds) Computational Intelligence in Data Mining. Advances in Intelligent Systems and Computing, vol 711. Springer, SingaporeGoogle Scholar
  24. 24.
    Gupta A et al (2013) Improving HPC application performance in cloud through dynamic load balancing. In: 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. IEEEGoogle Scholar
  25. 25.
    Benchara FZ et al (2016) A new efficient distributed computing middleware based on cloud micro-services for HPC. In: 5th International Conference on Multimedia Computing and Systems (ICMCS). IEEEGoogle Scholar
  26. 26.
    Suh E, Narahari B, Simha R (1998) Dynamic load balancing schemes for computing accessible surface area of Protein molecules. In: Proceedings of the 5th International Conference on High Performance Computing (Cat. No. 98EX238). IEEEGoogle Scholar
  27. 27.
    Young WS, Brooks III CL (1995) Dynamic load balancing algorithms for replicated data molecular dynamics. J Comput Chem 16(6):715–722CrossRefGoogle Scholar
  28. 28.
    Mrozek D, Maysiak-Mrozek B, Kapciski A (2014) Cloud4Psi: cloud computing for 3D protein structure similarity searching. Bioinformatics 30(19):2822–2825CrossRefGoogle Scholar
  29. 29.
    Auricchio F et al (2018) Benchmarking a hemodynamics application on Intel based HPC systems. Paral Comput Everywhere 32:57Google Scholar
  30. 30.
    Ferretti M, Santangelo L (2019) Profiling hemodynamic application for parallel computing in the cloud. in: 27th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP2019)Google Scholar
  31. 31.
    Auricchio F et al (2018) Parallelizing a finite element solver in computational hemodynamics: a black box approach. Int J High Perform Comput Appl 32(3):351–362MathSciNetCrossRefGoogle Scholar
  32. 32.
    Auricchio F et al (2015) Assessment of a black-box approach for a parallel finite elements solver in computational hemodynamics. In: IEEE Trustcom/BigDataSE/ISPA, vol 3. IEEEGoogle Scholar
  33. 33.
    Do Chuong B, Katoh K (2009) Protein multiple sequence alignment. In: Functional Proteomics. Humana Press, pp 379–413Google Scholar
  34. 34.
    Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices. J Mol Biol 233(1):123–138CrossRefGoogle Scholar
  35. 35.
    Shi S et al (2007) Searching for three-dimensional secondary structural patterns in proteins with ProSMoS. Bioinformatics 23(11):1331–1338CrossRefGoogle Scholar
  36. 36.
    Shi S, Chitturi B, Grishin NV (2009) ProSMoS server: a pattern-based search using interaction matrix representation of protein structures. Nucl Acids Res 37(suppl2):W526–W531CrossRefGoogle Scholar
  37. 37.
    Hutchinson EG, Thornton Janet M (1996) PROMOTIF—a program to identify and analyze structural motifs in proteins. Prot Sci 5(2):212–220CrossRefGoogle Scholar
  38. 38.
    Dror O et al (2003) MASS: multiple structural alignment by secondary structures. Bioinformatics 19(suppl1):i95–i104CrossRefGoogle Scholar
  39. 39.
    Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr Sect D 60(12):2256–2268CrossRefGoogle Scholar
  40. 40.
    Aung Z, Li J (2007) Mining super-secondary structure motifs from 3d protein structures: a sequence order independent approach. Genome Inform 19:1526Google Scholar
  41. 41.
    Cantoni V et al (2014) Protein motif retrieval by secondary structure element geometry and biological features saliency. In: 25th International Workshop on Database and Expert Systems Applications. IEEEGoogle Scholar
  42. 42.
    Argentieri T, Cantoni V, Musci M (2017) Extending cross motif search with heuristic data mining. In: 28th International Workshop on Database and Expert Systems Applications (DEXA). IEEEGoogle Scholar
  43. 43.
    Musci M, Ferretti M (2018) Mining geometrical motifs co-occurrences in the CMS dataset. In: International Conference on Database and Expert Systems Applications. Springer, ChamGoogle Scholar
  44. 44.
    Ballard DH (1981) Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognit 13(2):111–122, ISSN 0031-3203,Google Scholar
  45. 45.
    Argentieri T, Cantoni V, Musci M (2016) MotifVisualizer: an interdisciplinary GUI for geometrical motif retrieval in proteins. In: 27th International Workshop on Database and Expert Systems Applications (DEXA). IEEEGoogle Scholar
  46. 46.
    Protein Data Bank. 2019, March 6. Retrieved from
  47. 47.
    Wesbrook J, Ito N, Nakamura H, Henrick K, Berman HM (2004) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21(7):988–992CrossRefGoogle Scholar
  48. 48.
    Tata S, Friedman JS, Swaroop A (2006) Declarative querying for biological sequences. In: 22nd International Conference on Data Engineering (ICDE’06). IEEEGoogle Scholar
  49. 49.
    Mrozek D et al (2016) An efficient and flexible scanning of databases of protein secondary structures. J Intell Inform Syst 46(1):213–233CrossRefGoogle Scholar
  50. 50.
    Hammel L, Patel JM (2002) Searching on the secondary structure of protein sequences. In: VLDB’02: Proceedings of the 28th International Conference on Very Large Databases. Morgan KaufmannGoogle Scholar
  51. 51.
    Wang Y, Sunderraman Rr, Tian H (2006) A domain specific data management architecture for protein structure data. In: International Conference of the IEEE Engineering in Medicine and Biology Society. IEEEGoogle Scholar
  52. 52.
    Murzin Alexey G et al (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540Google Scholar
  53. 53.
    Marconi (2017) the new Tier-0 system. 2017, July 21. Retrieved from
  54. 54.
    Kielmann T, Bal H E, Verstoep K (2000) Fast measurement of LogP parameters for message passing platforms. In: International Parallel and Distributed Processing Symposium. Springer, BerlinGoogle Scholar
  55. 55.
    Machined types. 2018, May 16. Retrieved from
  56. 56.
    Advanced VPC Concept. 2018, December 17. Retrieved from
  57. 57.
    Quota. 2019, March 06. Retrieved from
  58. 58.
    Nomura A, Matsuba H, Ishikawa Y (2007) Network performance model for TCP/IP based cluster computing. In: IEEE International Conference on Cluster Computing, Austin, TX, 2007, pp 194–203Google Scholar
  59. 59.
    Li L, Zhang X, Feng J, Dong X (2010) mPlogP: a parallel computation model for heterogeneous multi-core computer. In: 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, Melbourne, VIC, 2010, pp 679–684Google Scholar
  60. 60.
    Hoefler T, Mehlan T, Lumsdaine A, Rehm W (2007) Netgauge: a network performance measurement framework. In: Perrott R, Chapman BM, Subhlok J, de Mello RF, Yang LT (eds) High Performance Computing and Communications. HPCC 2007. Lecture Notes in Computer Science, vol 4782. Springer, BerlinGoogle Scholar
  61. 61.
    Hockney R (1994) The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Comput 20(3):389–398CrossRefGoogle Scholar
  62. 62.
    Alexandrov A, Ionescu MF, Schauser KE, Scheiman C (1995) LogGP: incorporating long messages into the LogP model. In: Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures. ACM Press, New York, pp 95–105Google Scholar
  63. 63.
    Culler D, Karp R, Patterson D, Sahay A, Schauser KE, Santos E, Subramonian R, von Eicken T (1993) LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM Press, New York, p 112Google Scholar
  64. 64.
    Steffenel LA, Mounie G (2008) A framework for adaptive collective communications for heterogeneous hierarchical computing systems. J Comput Syst Sci 74(6):1082–1093MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019
corrected publication 2019

Authors and Affiliations

  1. 1.Department of Electrical, Computer and Biological EngineeringUniversity of PaviaPaviaItaly

Personalised recommendations