Skip to main content

Big Data, Simulations and HPC Convergence

  • Conference paper
  • First Online:
Big Data Benchmarking (WBDB 2015, WBDB 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10044))

Included in the following conference series:

Abstract

Two major trends in computing systems are the growth in high performance computing (HPC) with in particular an international exascale initiative, and big data with an accompanying cloud infrastructure of dramatic and increasing size and sophistication. In this paper, we study an approach to convergence for software and applications/algorithms and show what hardware architectures it suggests. We start by dividing applications into data plus model components and classifying each component (whether from Big Data or Big Compute) in the same way. This leads to 64 properties divided into 4 views, which are Problem Architecture (Macro pattern); Execution Features (Micro patterns); Data Source and Style; and finally the Processing (runtime) View. We discuss convergence software built around HPC-ABDS (High Performance Computing enhanced Apache Big Data Stack) and show how one can merge Big Data and HPC (Big Simulation) concepts into a single stack and discuss appropriate hardware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Big Data and Extreme-scale Computing (BDEC). http://www.exascale.org/bdec/. Accessed 29 Jan 2016

  2. Data Science Curriculum: Indiana University Online Class: Big Data Open Source Software and Projects (2014). http://bigdataopensourceprojects.soic.indiana.edu/. Accessed 11 Dec 2014

  3. DDDAS Dynamic Data-Driven Applications System Showcase. http://www.1dddas.org/. Accessed 22 July 2015

  4. HPC-ABDS Kaleidoscope of over 350 Apache Big Data Stack and HPC Technologies. http://hpc-abds.org/kaleidoscope/

  5. NSCI: Executive Order - creating a National Strategic Computing Initiative, 29 July 2015. https://www.whitehouse.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative

  6. NIST Big Data Use Case & Requirements. V1.0 Final Version 2015, January 2016. http://bigdatawg.nist.gov/V1_output_docs.php

  7. Apache Software Foundation: Apache Flink open source platform for distributed stream and batch data processing. https://flink.apache.org/. Accessed 16 Jan 2016

  8. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: a view from Berkeley. Tech. rep., UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006). http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html

  9. Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, L., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., et al.: The NAS parallel benchmarks. Int. J. High Perform. Comput. Appl. 5(3), 63–73 (1991)

    Article  Google Scholar 

  10. Baru, C., Rabl, T.: Tutorial 4 “Big Data Benchmarking” at 2014 IEEE International Conference on Big Data (2014). http://cci.drexel.edu/bigdata/bigdata2014/tutorial.htm Accessed 2 Jan 2015

  11. Baru, C.: BigData Top 100 List. http://www.bigdatatop.100.org/. Accessed Jan 2016

  12. Bryant, R.E.: Data-Intensive Supercomputing: The case for DISC, 10 May 2007. http://www.cs.cmu.edu/bryant/pubdir/cmu-cs-07-128.pdf

  13. Bryant, R.E.: Supercomputing & Big Data: A Convergence. https://www.nitrd.gov/nitrdgroups/images/5/5e/SC15panel_RandalBryant.pdf. Supercomputing (SC) 15 Panel- Supercomputing and Big Data: From Collision to Convergence Nov 18 2015 - Austin, Texas. https://www.nitrd.gov/apps/hecportal/index.php?title=Events#Supercomputing_.28SC.29_15_Panel

  14. Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., Andrew, N.: Deep learning with COTS HPC systems. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1337–1345 (2013)

    Google Scholar 

  15. Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.H., Qiu, J., Fox, G.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pp. 810–818. ACM (2010)

    Google Scholar 

  16. Ekanayake, J., Pallickara, S., Fox, G.: Mapreduce for data intensive scientific analyses. In: IEEE Fourth International Conference on eScience (eScience 2008), pp. 277–284. IEEE (2008)

    Google Scholar 

  17. Ekanayake, S., Kamburugamuve, S., Fox, G.: SPIDAL: high performance data analytics with Java and MPI on large multicore HPC clusters, Technical report, January 2016. http://dsc.soic.indiana.edu/publications/hpc2016-spidal-high-performance-submit-18-public.pdf

  18. Fox, G., Jha, S., Qiu, J., Ekanazake, S., Luckow, A.: Towards a comprehensive set of big data benchmarks. In: Big Data and High Performance Computing, vol. 26, p. 47, February 2015. http://grids.ucs.indiana.edu/ptliupages/publications/OgreFacetsv9.pdf

  19. Fox, G., Chang, W.: Big data use cases and requirements. In: 1st Big Data Interoperability Framework Workshop: Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data, pp. 18–21 (2014)

    Google Scholar 

  20. Fox, G., Qiu, J., Jha, S.: High performance high functionality big data software stack. In: Big Data and Extreme-scale Computing (BDEC) (2014). http://www.exascale.org/bdec/sites/www.exascale.org.bdec/files/whitepapers/fox.pdf

  21. Fox, G.C., Jha, S., Qiu, J., Luckow, A.: Towards an understanding of facets and exemplars of big data applications. In: 20 Years of Beowulf: Workshop to Honor Thomas Sterling’s 65th Birthday October, Annapolis 14 October 2014. http://dx.doi.org/10.1145/2737909.2737912

  22. Fox, G.C., Jha, S., Qiu, J., Luckow, A.: Ogres: a systematic approach to big data benchmarks. In: Big Data and Extreme-scale, Computing (BDEC), pp. 29–30 (2015)

    Google Scholar 

  23. Fox, G.C., Qiu, J., Kamburugamuve, S., Jha, S., Luckow, A.: HPC-ABDS high performance computing enhanced apache big data stack. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 1057–1066. IEEE (2015)

    Google Scholar 

  24. Iandola, F.N., Ashraf, K., Moskewicz, M.W., Keutzer, K.: FireCaffe: near-linear acceleration of deep neural network training on compute clusters. arXiv preprint arxiv:1511.00175 (2015)

  25. Jha, S., Qiu, J., Luckow, A., Mantha, P., Fox, G.C.: A tale of two data-intensive paradigms: applications, abstractions, and architectures. In: 2014 IEEE International Congress on Big Data (BigData Congress), pp. 645–652. IEEE (2014)

    Google Scholar 

  26. Kamburugamuve, S., Ekanayake, S., Pathirage, M., Fox, G.: Towards high performance processing of streaming data in large data centers, Technical report (2016). http://dsc.soic.indiana.edu/publications/high_performance_processing_stream.pdf

  27. National Research Council: Frontiers in Massive Data Analysis. The National Academies Press, Washington (2013)

    Google Scholar 

  28. Qiu, J., Jha, S., Luckow, A., Fox, G.C.: Towards HPC-ABDS: an initial high-performance big data stack. In: Building Robust Big Data Ecosystem ISO/IEC JTC 1 Study Group on Big Data, pp. 18–21 (2014). http://grids.ucs.indiana.edu/ptliupages/publications/nist-hpc-abds.pdf

  29. Reed, D.A., Dongarra, J.: Exascale computing and big data. Commun. ACM 58(7), 56–68 (2015)

    Article  Google Scholar 

  30. Trader, T.: Toward a converged exascale-big data software stack, 28 January 2016. http://www.hpcwire.com/2016/01/28/toward-a-converged-software/-stack-for-extreme-scale-computing-and-big-data/

  31. Van der Wijngaart, R.F., Sridharan, S., Lee, V.W.: Extending the BT NAS parallel benchmark to exascale computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 94. IEEE Computer Society Press (2012)

    Google Scholar 

  32. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, vol. 10, p. 10 (2010)

    Google Scholar 

  33. Zhang, B., Peng, B., Qiu, J.: Parallel LDA through synchronized communication optimizations. Technical report (2015). http://dsc.soic.indiana.edu/publications/LDA_optimization_paper.pdf

  34. Zhang, B., Ruan, Y., Qiu, J.: Harp: collective communication on hadoop. In: IEEE International Conference on Cloud Engineering (IC2E) Conference (2014)

    Google Scholar 

Download references

Acknowledgments

This work was partially supported by NSF CIF21 DIBBS 1443054, NSF OCI 1149432 CAREER. and AFOSR FA9550-13-1-0225 awards. We thank Dennis Gannon for comments on an early draft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffrey Fox .

Editor information

Editors and Affiliations

Appendix: Convergence Diamonds with 64 Facets

Appendix: Convergence Diamonds with 64 Facets

These are discussed in Sect. 2 and summarized in Fig. 1

Table 1. Convergence Diamonds and their Facets.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Fox, G., Qiu, J., Jha, S., Ekanayake, S., Kamburugamuve, S. (2016). Big Data, Simulations and HPC Convergence. In: Rabl, T., Nambiar, R., Baru, C., Bhandarkar, M., Poess, M., Pyne, S. (eds) Big Data Benchmarking. WBDB WBDB 2015 2015. Lecture Notes in Computer Science(), vol 10044. Springer, Cham. https://doi.org/10.1007/978-3-319-49748-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49748-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49747-1

  • Online ISBN: 978-3-319-49748-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics