Skip to main content

Is High Performance Computing (HPC) Ready to Handle Big Data?

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 759))

Abstract

In recent years big data has emerged as a universal term and its management has become a crucial research topic. The phrase ‘big data’ refers to data sets so large and complex that the processing of them requires collaborative High Performance Computing (HPC). How to effectively allocate resources is one of the prime challenges in HPC. This leads us to the question: are the existing HPC resource allocation techniques effective enough to support future big data challenges? In this context, we have investigated the effectiveness of HPC resource allocation using the Google cluster dataset and a number of data mining tools to determine the correlational coefficient between resource allocation, resource usages and priority. Our analysis initially focused on correlation between resource allocation and resource uses. The finding shows that a high volume of resources that are allocated by the system for a job are not being used by that same job. To investigate further, we analyzed the correlation between resource allocation, resource usages and priority. Our clustering, classification and prediction techniques identified that the allocation and uses of resources are very loosely correlated with priority of the jobs. This research shows that our current HPC scheduling needs improvement in order to accommodate the big data challenge efficiently.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Amir, Y., Awerbuch, B., Barak, A., Borgstrom, R., Keren, A.: An opportunity cost approach for job assignment in a scalable computing cluster. IEEE Trans. Parallel Distrib. Syst. 11(7), 760–768 (2000)

    Article  Google Scholar 

  2. Kokkinos, P., Varvarigos, E.: A framework for providing hard delay guarantees and user fairness in grid computing. Future Gener. Comput. Syst. 25(6), 674–686 (2009)

    Article  Google Scholar 

  3. Pinel, F., Pecero, J., Khan, S., Bouvry, P.: A review on task performance prediction in multi-core based systems. In: 11th IEEE International Conference on Computer and Information Technology (CIT), pp. 615–620, September 2011

    Google Scholar 

  4. Mostafa, S.M., Rida, S.Z., Hamad, S.H.: Finding time quantum of round robin CPU scheduling algorithm in general computing systems using integer programming. Int. J. Res. Rev. Appl. Sci. (IJRRAS) 5(1), 64–71 (2010)

    Google Scholar 

  5. Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: state of art and open problems. Queens University. Technical report. http://research.cs.queensu.ca/home/akl/techreports/GridComputing.pdf. Accessed 15 Aug 2016

  6. Valentini, G.L., Khan, S.U., Bouvry, P.: Energy-Efficient Resource Utilization in Cloud Computing. Wiley, Hoboken (2013)

    Book  Google Scholar 

  7. Stankovic, J., Ramamritham, K.: Scheduling algorithm and operating system support for real-time systems. Proc. IEEE 82(1), 55–67 (2002)

    Google Scholar 

  8. Berstein, P.: http://research.microsoft.com/en-us/people/philbe/chapter3.pdf. Accessed 16 Aug 2015

  9. Wieder, P., Waldrich, O., Ziegler, W.: Advanced techniques for scheduling, reservation and access management for remote laboratories and instruments. In: 2nd IEEE International Conference on e-Science and Grid, p. 128, December 2006

    Google Scholar 

  10. Di, S., Kondo, D., Cappello, F.: Characterizing cloud applications on a Google data center. In: 42nd International Conference on Parallel Processing (ICPP), October 2013

    Google Scholar 

  11. Liu, Z., Cho, S.: Characterizing machines and workloads on a Google cluster. In: 8th International Workshop on Scheduling and Resource Management, September 2012

    Google Scholar 

  12. Di, S., Kondo, D., Cirne, W.: Characterization and comparison of cloud versus Grid workloads. In: International Conference on Cluster Computing (IEEE CLUSTER), pp. 230–238, September 2012

    Google Scholar 

  13. Mishra, A.K., Hellerstein, J.L., Cirne, W., Das, C.R.: Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform. Eval. Rev. 37, 34–41 (2010)

    Article  Google Scholar 

  14. Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale, San Jose, CA, USA, October 2012

    Google Scholar 

  15. Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format + schema. Google Inc., Mountain View, CA, USA, Technical report (2011)

    Google Scholar 

  16. N.A., Weka from The University of Waikato. http://weka.wikispaces.com/ZeroR. Accessed 10 Oct 2015

  17. Lane, D.M.: Introduction to Linear Regression. http://onlinestatbook.com/2/regression/intro.html. Accessed 11 Oct 2016

  18. Sharma (Sachdeva), R., Alam, M.A., Rani, A.: K-means clustering in spartial data mining using weka interface. In: International Conference on Advances in Communication and Computing Technologies (2012)

    Google Scholar 

  19. Microsoft Developer Network. http://msdn.microsoft.com/en-us/library/cc645868.aspx. Accessed 10 Oct 2016

  20. Wilkes, J., Reiss, C.: Googleclusterdata (2011). https://code.google.com/p/googleclusterdata/wiki/ClusterData2011_1. Accessed 20 Aug 2016

  21. Wilkes, J.: September 2014. https://code.google.com/p/googleclusterdata/wiki/Bibliography. Accessed 15 Sep 2016

  22. Xiao, L., Chen, S., Zhang, X.: Adaptive memory allocations in clusters to handle unexpectedly large data-intensive jobs. IEEE Trans. Parallel Distrib. Syst. 15, 577–592 (2004)

    Article  Google Scholar 

  23. Hassan, M.M., Song, B., Hossain, M.S., Alamri, A.: QoS-aware resource provisioning for big data processing in cloud computing environment. In: International Conference on Computational Science and Computational Intelligence (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Biplob R. Ray .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ray, B.R., Chowdhury, M., Atif, U. (2017). Is High Performance Computing (HPC) Ready to Handle Big Data?. In: Doss, R., Piramuthu, S., Zhou, W. (eds) Future Network Systems and Security. FNSS 2017. Communications in Computer and Information Science, vol 759. Springer, Cham. https://doi.org/10.1007/978-3-319-65548-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65548-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65547-5

  • Online ISBN: 978-3-319-65548-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics