The Journal of Supercomputing

, Volume 74, Issue 4, pp 1715–1764 | Cite as

A mathematical model to calculate real cost/performance in software distributed shared memory on computing environments

  • Ehsan Mousavi Khaneghah
  • Nosratollah Shadnoush
  • Amir Hossein Ghobakhlou


One of the important factors in high-performance computing (HPC) is the cost/performance ratio. Calculation of cost/performance ratio is the main criterion for the separation of hardware computing systems (supercomputers) from software computing systems (Cluster, Grid, Peer-to-Peer). There are various economic methods to calculate hardware cost. In addition, there are numerous methods in software engineering to calculate the cost of developing and programming the scientific and engineering software. The computing power in the aforementioned systems is basically calculated with programs like LINPACK and HPCL. The inter-process communication is considered as a variable in calculating the cost of executing the scientific programs, whose nature and amount depends on the program execution itself. As there is a high dependency of effective variables in cost calculation of inter-process communications during the program execution, it should be used for calculating the cost of any application. This paper complements the existing methods by presenting a more comprehensive and accurate method to calculate the real cost of distributed shared memory (DSM) mechanisms used by HPC Systems. Therefore, a systematic method has been used to achieve a whole equation for DSM costing, determine the effective factors of the cost, and propose a method based on costing economic methods. Effective parameters are classified into two groups, namely DSM-inhere dependent and application-specific dependent parameters. Each parameter is presented and discussed, and the correlation between them specifies the system’s weight on DSM real cost according to which the cost is modeled and validated analytically.


Distributed shared memory (DSM) Cost model High-performance computing (HPC) False sharing Correlation 


  1. 1.
    Expósito, RR et al (2013) Running scientific codes on Amazon EC2: a performance analysis of five high-end instances. J Comput Sci Technol 13:153–159Google Scholar
  2. 2.
    Al Geist, Reed DA (2017) A survey of high-performance computing scaling challenges. Int J High Perform Comput Appl 31(1):104–113CrossRefGoogle Scholar
  3. 3.
    Thackston R, Fortenberry R (2015) High performance computing: considerations when deciding to rent or buyGoogle Scholar
  4. 4.
    Zhang, Z, Cherkasova L, Loo BT (2014) Optimizing cost and performance trade-offs for MapReduce job processing in the cloud. In: 2014 IEEE Network Operations and Management Symposium (NOMS). IEEEGoogle Scholar
  5. 5.
    Kurmann C, Rauch F, Stricker TM, (2003) Cost/performance tradeoffs in network interconnects for clusters of commodity PCs. Workshop on Communication Architecture for Clusters, Nice, FranceGoogle Scholar
  6. 6.
    Rauber T, Rünger G (2013) Parallel programming: for multicore and cluster systems. Springer, BerlinCrossRefzbMATHGoogle Scholar
  7. 7.
    Tootaghaj DZ et al (2015) Evaluating the combined impact of node architecture and cloud workload characteristics on network traffic and performance/cost. In: 2015 IEEE International Symposium on Workload Characterization (IISWC). IEEEGoogle Scholar
  8. 8.
    Adams M (2014) HPGMG 1.0: a benchmark for ranking high performance computing systemsGoogle Scholar
  9. 9.
    Sukharev PV et al (2017) Benchmarking of high performance computing clusters with heterogeneous CPU/GPU architecture. In: IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). IEEEGoogle Scholar
  10. 10.
    Dongarra J, Heroux MA, Luszczek P (2015) HPCG benchmark: a new metric for ranking high performance computing systems. KnoxvilleGoogle Scholar
  11. 11.
    Al-Roomi M et al (2013) Cloud computing pricing models: a survey. Int J Grid Distrib Comput 6(5):93–106CrossRefGoogle Scholar
  12. 12.
    Bhowmick A, Prasad CGVN (2017) Time and cost optimization by grid computing over existing traditional IT systems in business environment. Int J 5:93–98Google Scholar
  13. 13.
    Han R et al (2014) Enabling cost-aware and adaptive elasticity of multi-tier cloud applications. Future Gener Comput Syst 32:82–98CrossRefGoogle Scholar
  14. 14.
    Núñez A, Merayo MG (2014) A formal framework to analyze cost and performance in map-reduce based applications. J Comput Sci 5(2):106–118CrossRefGoogle Scholar
  15. 15.
    Iosup A et al (2011) Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Trans Parallel Distrib Syst 22(6):931–945CrossRefGoogle Scholar
  16. 16.
    Menascé D, Almeida V (1990) Cost-performance analysis of heterogeneity in supercomputer architectures. In: Proceedings of Supercomputing’90. IEEEGoogle Scholar
  17. 17.
    Marathe A et al (2013) A comparative study of high-performance computing on the cloud. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. ACMGoogle Scholar
  18. 18.
    Garg SK, Versteeg S, Buyya R (2013) A framework for ranking of cloud computing services. Future Gener Comput Syst 29(4):1012–1023CrossRefGoogle Scholar
  19. 19.
    De Alfonso C (2013) An economic and energy-aware analysis of the viability of outsourcing cluster computing to a cloud. Future Gener Comput Syst 29(3):704–712CrossRefGoogle Scholar
  20. 20.
    Kaplan R, Anderson SR (2013) Time-driven activity-based costing: a simpler and more powerful path to higher profits. Harvard business press, BostonGoogle Scholar
  21. 21.
    Tahir M et al (2016) Framework for Better Reusability in Component Based Software Engineering. J Appl Environ Biol Sci (JAEBS) 6:77–81Google Scholar
  22. 22.
    Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach. CRC Press, Boca RatonCrossRefzbMATHGoogle Scholar
  23. 23.
    Berriman GB et al (2010) The application of cloud computing to astronomy: a study of cost and performance. In: Sixth IEEE International Conference on e-Science Workshops. IEEEGoogle Scholar
  24. 24.
    Deelman E et al (2015) Pegasus, a workflow management system for science automation. Future Gener Comput Syst 46:17–35CrossRefGoogle Scholar
  25. 25.
    Yan Z et al (2011) Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications. In: State of the Practice Reports. ACMGoogle Scholar
  26. 26.
    Woitaszek M, Tufo HM (2010) Developing a cloud computing charging model for high-performance computing resources. In: IEEE 10th International Conference on Computer and Information Technology (CIT). IEEEGoogle Scholar
  27. 27.
    Aviram A et al (2012) Efficient system-enforced deterministic parallelism. Commun ACM 55(5):111–119CrossRefGoogle Scholar
  28. 28.
    Otley D, Emmanuel KMC (2013) Readings in accounting for management control. Springer, BerlinGoogle Scholar
  29. 29.
    Schöner G (2013) Dynamical systems thinking. In: Handbook of developmental systems theory and methodology, p 188Google Scholar
  30. 30.
    Drury CM (2013) Management and cost accounting. Springer, BerlinGoogle Scholar
  31. 31.
    Deegan C (2012) Australian financial accounting. McGraw-Hill Education AustraliaGoogle Scholar
  32. 32.
    Lian X et al (2015) Cache coherence protocols in shared-memory multiprocessorsGoogle Scholar
  33. 33.
    Lenoski DE, Weber W-D (2014) Scalable shared-memory multiprocessing. Elsevier, AmsterdamGoogle Scholar
  34. 34.
    Qura-Tul FASN, Khan AKDMS (2015) Development of cluster computing—a review. Development 5(1):1–9Google Scholar
  35. 35.
    Satish N et al (2012) Can traditional programming bridge the ninja performance gap for parallel computing applications? ACM SIGARCH Computer Architecture News, vol 40, no 3. IEEE Computer SocietyGoogle Scholar
  36. 36.
    Menezo LG, Puente V, Gregorio J-A (2015) Flask coherence: a morphable hybrid coherence protocol to balance energy, performance, and scalability. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEEGoogle Scholar
  37. 37.
    Serrano Gómez M (2013) Scheduling local and remote memory in cluster computers. Dissertation, Editorial Universitat Politècnica de ValènciaGoogle Scholar
  38. 38.
    Behrends R et al (2016) HPC-GAP: engineering a 21st-century high-performance computer algebra system. Concurr Comput Pract Exp 28(13):3606–3636CrossRefGoogle Scholar
  39. 39.
    Kasahara H et al (2012) Method for controlling heterogeneous multiprocessor and multigrain parallelizing compiler. US Patent 8,250,548, 21 AugGoogle Scholar
  40. 40.
    Marongiu A, Benini L (2012) An OpenMP compiler for efficient use of distributed scratchpad memory in MPSoCs. IEEE Trans Comput 61(2):222–236MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    Engle C et al (2012) Shark: fast data analysis sing coarse-grained distributed memory. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACMGoogle Scholar
  42. 42.
    Cruz EHM et al (2014) Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols. J Parallel Distrib Comput 74(3):2215–2228CrossRefGoogle Scholar
  43. 43.
    Habel R, Silber-Chaussumier F, Irigoin F (2013) Generating Efficient Parallel Programs for Distributed Memory Systems. Technical Report CRI/A-523, MINES ParisTech and Télécom SudParisGoogle Scholar
  44. 44.
    Sim J et al (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. ACM SIGPLAN Notices, vol 47, no 8. ACMGoogle Scholar
  45. 45.
    Kaashoek MF (2015) Parallel computing and the OS. SOSP History Day 2015. ACMGoogle Scholar
  46. 46.
    Bericht T, Darmstadt TH, Informatik F, Theel OE, Fleisch Br D (1996) A dynamic coherence protocol for distributed shared memory enforcing high data availability at low costs. IEEE Trans Parallel Distrib Syst 7(9):915–30CrossRefGoogle Scholar
  47. 47.
    Yuan D et al (2014) Simple testing can prevent most critical failures: an analysis of production failures in distributed data-intensive systems. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14)Google Scholar
  48. 48.
    Medya S, Cherkasova L, Magalhaes G, Ozonat K, Padmanabha C, Sarma J, Sheikh I (2016) Towards performance and scalability analysis of distributed memory programs on large-scale clusters. In: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering. ACM, pp 113–116Google Scholar
  49. 49.
    He S et al (2013) A cost-aware region-level data placement scheme for hybrid parallel i/o systems. In: IEEE International Conference on Cluster Computing (CLUSTER). IEEEGoogle Scholar
  50. 50.
    Susmit B (2014) The software architecture for efficient distributed interprocess communication in mobile distributed systems. J Grid Comput 12(4):615–635CrossRefGoogle Scholar
  51. 51.
    Sharifi M, Mirtaheri SL, Khaneghah EM (2010) A dynamic framework for integrated management of all types of resources in P2P systems. J Supercomput 52(2):149–170CrossRefGoogle Scholar
  52. 52.
    Khaneghah EM (2017) PMamut: runtime flexible resource management framework in scalable distributed system based on nature of request, demand and supply and federalism. US Patent 9,613,312, 4 AprGoogle Scholar
  53. 53.
    Musial P, Nicolaou N, Shvartsman AA (2014) Implementing distributed shared memory for dynamic networks. Commun ACM 57(6):88–98CrossRefGoogle Scholar
  54. 54.
    Kim J, Vaidya NH (1997) A cost model for distributed shared memory using competitive update. In: Fourth International Conference on High-Performance Computing, Bangalore, IndiaGoogle Scholar
  55. 55.
    Gray J (1988) The cost of messages. In: Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, Toronto, Ontario, CanadaGoogle Scholar
  56. 56.
    Kim J-H, Vaidya NH (1997) A cost model for distributed shared memory using competitive update. In: Proceedings of Fourth International Conference on High-Performance Computing. IEEEGoogle Scholar
  57. 57.
    Li S et al (2015) An extensible framework for predictive analytics on cost and performance in the cloud. In: International Conference on Cloud Computing and Big Data (CCBD). IEEEGoogle Scholar
  58. 58.
    Dave VS, Dutta K (2014) Neural network based models for software effort estimation: a review. Artif Intell Rev 42(2):295–307CrossRefGoogle Scholar
  59. 59.
    Hassan HA, Mohamed SA, Sheta WM (2016) Scalability and communication performance of HPC on Azure Cloud. Egypt Inform J 17(2):175–182CrossRefGoogle Scholar
  60. 60.
    Midgley G (ed) (2003) Systems thinking. Sage, LondonGoogle Scholar
  61. 61.
    Thüm T et al (2014) A classification and survey of analysis strategies for software product lines. ACM Comput Surv (CSUR) 47(1):6CrossRefGoogle Scholar
  62. 62.
    Metzger A, Pohl K (2014) Software product line engineering and variability management: achievements and challenges. In: Proceedings of the on Future of Software Engineering. ACMGoogle Scholar
  63. 63.
    Sharifi M, Tirado-Ramos A, Khaneghah EM, Mirtaheri SL (2010) Formulating the real cost of dsm-inherent dependent parameters in HPC clusters. In: SMTP workshop in conjunction with the IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010), 19 AprilGoogle Scholar
  64. 64.
    Power R (2014) Abstractions for in-memory distributed computation. Dissertation, New York UniversityGoogle Scholar
  65. 65.
    Vasava, HD, Rathod JM (2015) Software based distributed shared memory (DSM) model using shared variables between multiprocessors. In: International Conference on Communications and Signal Processing (ICCSP). IEEEGoogle Scholar
  66. 66.
    Maosen H, Wei H, Huang Y (2016) Enabling mobile device coordination over distributed shared memory. In: IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS). IEEEGoogle Scholar
  67. 67.
    Pelley S, Chen PM, Wenisch TF (2014) Memory persistency. In: 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA). IEEEGoogle Scholar
  68. 68.
    Alglave J, Maranget L, Tautschnig M (2014) Herding cats: modelling, simulation, testing, and data mining for weak memory. ACM Trans Program Lang Syst (TOPLAS) 36(2):7CrossRefGoogle Scholar
  69. 69.
    Ghosh S (2014) Distributed systems: an algorithmic approach. CRC Press, Boca RatonGoogle Scholar
  70. 70.
    Kaxiras S et al (2015) Turning centralized coherence and distributed critical-section execution on their head: a new approach for scalable distributed shared memory. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. ACMGoogle Scholar
  71. 71.
    Das D, Ray RS, Ray UK (2016) Implementation and consistency issues in distributed shared memory. Int J Comput Sci Eng 4(12):125Google Scholar
  72. 72.
    Dulloor S R et al (2014) System software for persistent memory. In: Proceedings of the Ninth European Conference on Computer Systems. ACMGoogle Scholar
  73. 73.
    Low Y et al (2014) Graphlab: a new framework for parallel machine learning. arXiv:1408.2041
  74. 74.
    Javanbakht Z, Öchsner A (2017) Introduction to Marc/Mentat. In: Advanced finite element simulation with MSC Marc. Springer, ChamGoogle Scholar
  75. 75.
    Shrivastava A et al (2016) Automatic management of software programmable memories in many-core architectures. IET Comput Digit Tech 10(6):288–298CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Computer Engineering, Faculty of EngineeringShahed UniversityTehranIran
  2. 2.Department of Management, Faculty of Management, Central BranchIslamic AzadTehranIran

Personalised recommendations