Is High Performance Computing (HPC) Ready to Handle Big Data?

Ray, Biplob R.; Chowdhury, Morshed; Atif, Usman

doi:10.1007/978-3-319-65548-2_8

Is High Performance Computing (HPC) Ready to Handle Big Data?

Biplob R. Ray¹²,
Morshed Chowdhury¹³ &
Usman Atif¹³

Conference paper
First Online: 04 August 2017

662 Accesses
2 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 759))

Abstract

In recent years big data has emerged as a universal term and its management has become a crucial research topic. The phrase ‘big data’ refers to data sets so large and complex that the processing of them requires collaborative High Performance Computing (HPC). How to effectively allocate resources is one of the prime challenges in HPC. This leads us to the question: are the existing HPC resource allocation techniques effective enough to support future big data challenges? In this context, we have investigated the effectiveness of HPC resource allocation using the Google cluster dataset and a number of data mining tools to determine the correlational coefficient between resource allocation, resource usages and priority. Our analysis initially focused on correlation between resource allocation and resource uses. The finding shows that a high volume of resources that are allocated by the system for a job are not being used by that same job. To investigate further, we analyzed the correlation between resource allocation, resource usages and priority. Our clustering, classification and prediction techniques identified that the allocation and uses of resources are very loosely correlated with priority of the jobs. This research shows that our current HPC scheduling needs improvement in order to accommodate the big data challenge efficiently.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Amir, Y., Awerbuch, B., Barak, A., Borgstrom, R., Keren, A.: An opportunity cost approach for job assignment in a scalable computing cluster. IEEE Trans. Parallel Distrib. Syst. 11(7), 760–768 (2000)
Article Google Scholar
Kokkinos, P., Varvarigos, E.: A framework for providing hard delay guarantees and user fairness in grid computing. Future Gener. Comput. Syst. 25(6), 674–686 (2009)
Article Google Scholar
Pinel, F., Pecero, J., Khan, S., Bouvry, P.: A review on task performance prediction in multi-core based systems. In: 11th IEEE International Conference on Computer and Information Technology (CIT), pp. 615–620, September 2011
Google Scholar
Mostafa, S.M., Rida, S.Z., Hamad, S.H.: Finding time quantum of round robin CPU scheduling algorithm in general computing systems using integer programming. Int. J. Res. Rev. Appl. Sci. (IJRRAS) 5(1), 64–71 (2010)
Google Scholar
Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: state of art and open problems. Queens University. Technical report. http://research.cs.queensu.ca/home/akl/techreports/GridComputing.pdf. Accessed 15 Aug 2016
Valentini, G.L., Khan, S.U., Bouvry, P.: Energy-Efficient Resource Utilization in Cloud Computing. Wiley, Hoboken (2013)
Book Google Scholar
Stankovic, J., Ramamritham, K.: Scheduling algorithm and operating system support for real-time systems. Proc. IEEE 82(1), 55–67 (2002)
Google Scholar
Berstein, P.: http://research.microsoft.com/en-us/people/philbe/chapter3.pdf. Accessed 16 Aug 2015
Wieder, P., Waldrich, O., Ziegler, W.: Advanced techniques for scheduling, reservation and access management for remote laboratories and instruments. In: 2nd IEEE International Conference on e-Science and Grid, p. 128, December 2006
Google Scholar
Di, S., Kondo, D., Cappello, F.: Characterizing cloud applications on a Google data center. In: 42nd International Conference on Parallel Processing (ICPP), October 2013
Google Scholar
Liu, Z., Cho, S.: Characterizing machines and workloads on a Google cluster. In: 8th International Workshop on Scheduling and Resource Management, September 2012
Google Scholar
Di, S., Kondo, D., Cirne, W.: Characterization and comparison of cloud versus Grid workloads. In: International Conference on Cluster Computing (IEEE CLUSTER), pp. 230–238, September 2012
Google Scholar
Mishra, A.K., Hellerstein, J.L., Cirne, W., Das, C.R.: Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform. Eval. Rev. 37, 34–41 (2010)
Article Google Scholar
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale, San Jose, CA, USA, October 2012
Google Scholar
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format + schema. Google Inc., Mountain View, CA, USA, Technical report (2011)
Google Scholar
N.A., Weka from The University of Waikato. http://weka.wikispaces.com/ZeroR. Accessed 10 Oct 2015
Lane, D.M.: Introduction to Linear Regression. http://onlinestatbook.com/2/regression/intro.html. Accessed 11 Oct 2016
Sharma (Sachdeva), R., Alam, M.A., Rani, A.: K-means clustering in spartial data mining using weka interface. In: International Conference on Advances in Communication and Computing Technologies (2012)
Google Scholar
Microsoft Developer Network. http://msdn.microsoft.com/en-us/library/cc645868.aspx. Accessed 10 Oct 2016
Wilkes, J., Reiss, C.: Googleclusterdata (2011). https://code.google.com/p/googleclusterdata/wiki/ClusterData2011_1. Accessed 20 Aug 2016
Wilkes, J.: September 2014. https://code.google.com/p/googleclusterdata/wiki/Bibliography. Accessed 15 Sep 2016
Xiao, L., Chen, S., Zhang, X.: Adaptive memory allocations in clusters to handle unexpectedly large data-intensive jobs. IEEE Trans. Parallel Distrib. Syst. 15, 577–592 (2004)
Article Google Scholar
Hassan, M.M., Song, B., Hossain, M.S., Alamri, A.: QoS-aware resource provisioning for big data processing in cloud computing environment. In: International Conference on Computational Science and Computational Intelligence (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Intelligent Systems (CIS), School of Engineering and Technology, Central Queensland University, Cairns, Australia
Biplob R. Ray
School of Information Technology, Deakin University, Melbourne, Australia
Morshed Chowdhury & Usman Atif

Authors

Biplob R. Ray
View author publications
You can also search for this author in PubMed Google Scholar
Morshed Chowdhury
View author publications
You can also search for this author in PubMed Google Scholar
Usman Atif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Biplob R. Ray .

Editor information

Editors and Affiliations

Deakin University , Burwood, Victoria, Australia
Robin Doss
Department of Information Systems and Operations Management, University of Florida, Warrington College of Business, Gainesville, Florida, USA
Selwyn Piramuthu
Information and Operations Management Department, ESCP Europe, Paris, France
Wei Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ray, B.R., Chowdhury, M., Atif, U. (2017). Is High Performance Computing (HPC) Ready to Handle Big Data?. In: Doss, R., Piramuthu, S., Zhou, W. (eds) Future Network Systems and Security. FNSS 2017. Communications in Computer and Information Science, vol 759. Springer, Cham. https://doi.org/10.1007/978-3-319-65548-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-65548-2_8
Published: 04 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65547-5
Online ISBN: 978-3-319-65548-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics