Abstract
In recent years big data has emerged as a universal term and its management has become a crucial research topic. The phrase ‘big data’ refers to data sets so large and complex that the processing of them requires collaborative High Performance Computing (HPC). How to effectively allocate resources is one of the prime challenges in HPC. This leads us to the question: are the existing HPC resource allocation techniques effective enough to support future big data challenges? In this context, we have investigated the effectiveness of HPC resource allocation using the Google cluster dataset and a number of data mining tools to determine the correlational coefficient between resource allocation, resource usages and priority. Our analysis initially focused on correlation between resource allocation and resource uses. The finding shows that a high volume of resources that are allocated by the system for a job are not being used by that same job. To investigate further, we analyzed the correlation between resource allocation, resource usages and priority. Our clustering, classification and prediction techniques identified that the allocation and uses of resources are very loosely correlated with priority of the jobs. This research shows that our current HPC scheduling needs improvement in order to accommodate the big data challenge efficiently.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Amir, Y., Awerbuch, B., Barak, A., Borgstrom, R., Keren, A.: An opportunity cost approach for job assignment in a scalable computing cluster. IEEE Trans. Parallel Distrib. Syst. 11(7), 760–768 (2000)
Kokkinos, P., Varvarigos, E.: A framework for providing hard delay guarantees and user fairness in grid computing. Future Gener. Comput. Syst. 25(6), 674–686 (2009)
Pinel, F., Pecero, J., Khan, S., Bouvry, P.: A review on task performance prediction in multi-core based systems. In: 11th IEEE International Conference on Computer and Information Technology (CIT), pp. 615–620, September 2011
Mostafa, S.M., Rida, S.Z., Hamad, S.H.: Finding time quantum of round robin CPU scheduling algorithm in general computing systems using integer programming. Int. J. Res. Rev. Appl. Sci. (IJRRAS) 5(1), 64–71 (2010)
Dong, F., Akl, S.G.: Scheduling algorithms for grid computing: state of art and open problems. Queens University. Technical report. http://research.cs.queensu.ca/home/akl/techreports/GridComputing.pdf. Accessed 15 Aug 2016
Valentini, G.L., Khan, S.U., Bouvry, P.: Energy-Efficient Resource Utilization in Cloud Computing. Wiley, Hoboken (2013)
Stankovic, J., Ramamritham, K.: Scheduling algorithm and operating system support for real-time systems. Proc. IEEE 82(1), 55–67 (2002)
Berstein, P.: http://research.microsoft.com/en-us/people/philbe/chapter3.pdf. Accessed 16 Aug 2015
Wieder, P., Waldrich, O., Ziegler, W.: Advanced techniques for scheduling, reservation and access management for remote laboratories and instruments. In: 2nd IEEE International Conference on e-Science and Grid, p. 128, December 2006
Di, S., Kondo, D., Cappello, F.: Characterizing cloud applications on a Google data center. In: 42nd International Conference on Parallel Processing (ICPP), October 2013
Liu, Z., Cho, S.: Characterizing machines and workloads on a Google cluster. In: 8th International Workshop on Scheduling and Resource Management, September 2012
Di, S., Kondo, D., Cirne, W.: Characterization and comparison of cloud versus Grid workloads. In: International Conference on Cluster Computing (IEEE CLUSTER), pp. 230–238, September 2012
Mishra, A.K., Hellerstein, J.L., Cirne, W., Das, C.R.: Towards characterizing cloud backend workloads: insights from Google compute clusters. SIGMETRICS Perform. Eval. Rev. 37, 34–41 (2010)
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale, San Jose, CA, USA, October 2012
Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format + schema. Google Inc., Mountain View, CA, USA, Technical report (2011)
N.A., Weka from The University of Waikato. http://weka.wikispaces.com/ZeroR. Accessed 10 Oct 2015
Lane, D.M.: Introduction to Linear Regression. http://onlinestatbook.com/2/regression/intro.html. Accessed 11 Oct 2016
Sharma (Sachdeva), R., Alam, M.A., Rani, A.: K-means clustering in spartial data mining using weka interface. In: International Conference on Advances in Communication and Computing Technologies (2012)
Microsoft Developer Network. http://msdn.microsoft.com/en-us/library/cc645868.aspx. Accessed 10 Oct 2016
Wilkes, J., Reiss, C.: Googleclusterdata (2011). https://code.google.com/p/googleclusterdata/wiki/ClusterData2011_1. Accessed 20 Aug 2016
Wilkes, J.: September 2014. https://code.google.com/p/googleclusterdata/wiki/Bibliography. Accessed 15 Sep 2016
Xiao, L., Chen, S., Zhang, X.: Adaptive memory allocations in clusters to handle unexpectedly large data-intensive jobs. IEEE Trans. Parallel Distrib. Syst. 15, 577–592 (2004)
Hassan, M.M., Song, B., Hossain, M.S., Alamri, A.: QoS-aware resource provisioning for big data processing in cloud computing environment. In: International Conference on Computational Science and Computational Intelligence (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ray, B.R., Chowdhury, M., Atif, U. (2017). Is High Performance Computing (HPC) Ready to Handle Big Data?. In: Doss, R., Piramuthu, S., Zhou, W. (eds) Future Network Systems and Security. FNSS 2017. Communications in Computer and Information Science, vol 759. Springer, Cham. https://doi.org/10.1007/978-3-319-65548-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-65548-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65547-5
Online ISBN: 978-3-319-65548-2
eBook Packages: Computer ScienceComputer Science (R0)