Parallel Processing with Big Data

Parhami, Behrooz

doi:10.1007/978-3-319-63962-8_165-1

Behrooz Parhami³

343 Accesses
1 Citations

Synonyms

Big-data supercomputing; Computational needs of big data

Definition

Discrepancy between the explosive growth rate in data volumes and the improvement trends in processing and memory access speeds necessitates that parallel processing be applied to the handling of extremely large data sets.

Overview

Both data volumes and processing speeds have been on exponentially rising trajectories since the onset of the digital age (Denning and Lewis 2016), but the former has risen at a much higher rate than the latter. It follows that parallel processing is needed to bridge the gap. In addition to providing a higher processing capability to deal with the requirements of large data sets, parallel processing has the potential of easing the “von Neumann bottleneck” (Markgraf 2007), sometimes referred to as “the memory wall” because of its tendency to hinder the smooth progress of a computation, when operands cannot be supplied to the processor at the required rate (McKee 2004; Wulf and McKee 1995...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Abadi DJ, Carney D, Cetintemel U, Cherniack M, Convey C, Lee S, Stonebraker M, Tatbul N, Zdonik S (2003) Aurora: a new model and architecture for data stream management. Int J Very Large Data Bases 12(2):120–139
Article Google Scholar
Agrawal D, Das S, El Abbadi A (2011) Big data and cloud computing: current state and future opportunities. In: Proceedings of 14th international conference on extending database technology, Uppsala, pp 530–533
Google Scholar
Benini L, De Micheli G (2002) Networks on chips: a new SoC paradigm. IEEE Comput 35(1):70–78
Article Google Scholar
Brock DC, Moore GE (eds) (2006) Understanding Moore’s law: four decades of innovation. Chemical Heritage Foundation, Philadelphia
Google Scholar
Bu Y, Howe B, Balazinska M, Ernst MD (2010) HaLoop: efficient iterative data processing on large clusters. Proc VLDB Endowment 3(1–2):285–296
Article Google Scholar
Caulfield AM et al (2016) A cloud-scale acceleration architecture. In: Proceedings of 49th IEEE/ACM international symposium microarchitecture, Orlando, pp 1–13
Google Scholar
Ceze L, Hill MD, Wenisch TE (2016) Arch2030: a vision of computer architecture research over the next 15 years, Computing Community Consortium. On-line document. http://cra.org/ccc/wp-content/uploads/sites/2/2016/ 12/15447-CCC-ARCH-2030-report-v3-1-1.pdf
Condie T, Conway N, Alvaro P, Hellerstein JM, Elmeleegy K, Sears R (2010) MapReduce online. Proc USENIX Symp Networked Syst Des Implement 10(4):20
Google Scholar
Dally WJ, Towles BP (2004) Principles and practices of interconnection networks. Elsevier, Amsterdam
Google Scholar
Darema F (2001) The SPMD model: past, present and future. In: Proceedings of European parallel virtual machine/message passing interface users’ group meeting, Springer
MATH Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77
Article Google Scholar
Denning PJ, Lewis TG (2016) Exponential laws of computing growth. Commun ACM 60(1):54–65
Article Google Scholar
Duato J, Yalamanchili S, Ni LM (2003) Interconnection networks: an engineering approach. Morgan Kaufmann, San Francisco
Google Scholar
Eggers SJ, Emer JS, Levy HM, Lo JL, Stamm RL, Tullsen DM (1997) Simultaneous multithreading: a platform for next-generation processors. IEEE Micro 17(5):12–19
Article Google Scholar
Eugster PT, Felber PA, Guerraoui R, Kermarrec A-M (2003) The many faces of publish/subscribe. ACM Comput Surv 35(2):114–131
Article Google Scholar
Flynn MJ, Rudd KW (1996) Parallel architectures. ACM Comput Surv 28(1):67–70
Article Google Scholar
Gautschi M (2017) Design of energy-efficient processing elements for near-threshold parallel computing. Doctoral thesis, ETH Zurich
Google Scholar
Gepner P, Kowalik MF (2006) Multi-core processors: new way to achieve high system performance. In: Proceedings of IEEE international symposium on parallel computing in electrical engineering, Bialystak, pp 9–13
Google Scholar
Hord RM (2013) The Illiac IV: the first supercomputer. Springer, Berlin
MATH Google Scholar
Koomey JG, Berard S, Sanchez M, Wong H (2011) Implications of historical trends in the electrical efficiency of computing. IEEE Ann Hist Comput 33(3):46–54
Article MathSciNet Google Scholar
Kuon I, Tessier R, Rose J (2008) FPGA architecture: survey and challenges. Found Trends Electron Des Autom 2(2):135–253
Article Google Scholar
Lee RB (1997) Multimedia extensions for general-purpose processors. In: Proceedings of IEEE workshop signal processing systems, design and implementation, Leicester, pp 9–23
Google Scholar
Mack CA (2011) Fifty years of Moore’s law. IEEE Trans Semicond Manuf 24(2):202–207
Article Google Scholar
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of ACM SIGMOD international conference on management of data, Indianapolis, pp 135–146
Google Scholar
Markgraf JD (2007) The von Neumann bottleneck. On-line source that is no longer accessible (will find a replacement for this reference during revisions)
Google Scholar
McKee SA (2004) Reflections on the memory wall. In: Proceedings of the conference on computing frontiers, Ischia, pp 162–167
Google Scholar
Mueller R, Teubner J, Alonso G (2012) Sorting networks on FPGAs. Int J Very Large Data Bases 21(1):1–23
Article Google Scholar
Nanda S, Chiueh TC (2005) A survey on virtualization technologies, technical report TR179, Department of Computer Science, SUNY at Stony Brook
Google Scholar
NRC (2011) The future of computing performance: game over or next level? Report of the US National Research Council, National Academies Press
Google Scholar
Nvidia (2016) Nvidia Tesla P100: infinite compute power for the modern data center – technical overview. http://images.nvidia.com/content/ tesla/pdf/nvidia-teslap100-techoverview.pdf. Accessed 14 Dec 2017
Owens JD et al (2008) GPU computing. Proc IEEE 96(5):879–899
Article Google Scholar
Parhami B (1999) Chapter 7: Sorting networks. In: Introduction to parallel processing: algorithms and architectures. Plenum Press, New York, pp 129–147
Google Scholar
Rau BR, Fisher JA (1993) Instruction-level parallel processing: history, overview, and perspective. J Supercomput 7(1–2):9–50
Article Google Scholar
Rixner S (2001) Stream processor architecture. Kluwer, Boston
MATH Google Scholar
Rosenblum M, Garfinkel T (2005) Virtual machine monitors: current technology and future trends. IEEE Comput 38(5):39–47
Article Google Scholar
Sakai S, Hiraki K, Kodama Y, Yuba T (1989) An architecture of a dataflow single chip processor. ACM SIGARCH Comput Archit News 17(3):46–53
Article Google Scholar
Schaller RR (1997) Moore’s law: past, present and future. IEEE Spectr 34(6):52–59
Article Google Scholar
Shafer J, Rixner S, Cox AL (2010) The Hadoop distributed filesystem: balancing portability and performance. In: Proceedings of IEEE international symposium on performance analysis of systems & software, White Plains, pp 122–133
Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of 26th symposium on mass storage systems and technologies, Incline Village, pp 1–10
Google Scholar
Singer G (2013) The history of the modern graphics processor, TechSpot on-line article. http://www.techspot. com/article/650-history-of-the-gpu/. Accessed 14 Dec 2017
Sinnen O (2007) Task scheduling for parallel systems. Wiley, Hoboken
Book Google Scholar
Sklyarov V et al (2015) Hardware accelerators for information retrieval and data mining. In: Proceedings of IEEE conference on information and communication technology research, Bali, pp 202–205
Google Scholar
Stanford University (2012) 21st century computer architecture: a community white paper. http://csl.stanford.edu/~christos/publications/2012.21 stcenturyarchitecture.whitepaper.pdf
Top-500 Organization (2017) November 2017 list of the world’s top 500 supercomputers. http://www.top500. org/lists/2017/11/
Valiant LG (1990) A bridging model for parallel computation. Commun ACM 33(8):103–111
Article Google Scholar
Vavilapalli VK et al (2013) Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of fourth symposium on cloud computing, Santa Clara, p 5
Google Scholar
Wilkes MV (1972) Time-sharing computer systems. Elsevier, New York
MATH Google Scholar
Wulf W, McKee S (1995) Hitting the wall: implications of the obvious. ACM Comput Archit News 23(1):20–24
Article Google Scholar
Zaharia M et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA, USA
Behrooz Parhami

Authors

Behrooz Parhami
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Behrooz Parhami .

Editor information

Editors and Affiliations

School of Comp. Sci. and Engineering, University of New South Wales School of Comp. Sci. and Engineering, Eveleigh, New South Wales, Australia
Sherif Sakr
Sch of Info Techno, Building J12, University of Sydney Sch of Info Techno, Building J12, Sydney, Australia
Albert Zomaya

Section Editor information

No affiliation provided
Bingsheng He
No affiliation provided
Behrooz Parhami

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Parhami, B. (2018). Parallel Processing with Big Data. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_165-1

Download citation

DOI: https://doi.org/10.1007/978-3-319-63962-8_165-1
Received: 16 March 2018
Accepted: 23 March 2018
Published: 27 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63962-8
Online ISBN: 978-3-319-63962-8
eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics