The HPCC/ECL Platform for Big Data

Middleton, Anthony M.; Bayliss, David Alan; Halliday, Gavin; Chala, Arjuna; Furht, Borko

doi:10.1007/978-3-319-44550-2_6

Anthony M. Middleton³,
David Alan Bayliss³,
Gavin Halliday³,
Arjuna Chala³ &
…
Borko Furht⁴

4046 Accesses
1 Citations

Abstract

As a result of the continuing information explosion, many organizations are experiencing what is now called the “Big Data” problem. This results in the inability of organizations to effectively use massive amounts of their data in datasets which have grown to big to process in a timely manner. Data-intensive computing represents a new computing paradigm [26] which can address the big data problem using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kouzes RT, Anderson GA, Elbert ST, Gorton I, Gracio DK. The changing paradigm of data-intensive computing. Computer. 2009;42(1):26–34.
Article Google Scholar
Gorton I, Greenfield P, Szalay A, Williams R. Data-intensive computing in the 21st century. IEEE Comput. 2008;41(4):30–2.
Article Google Scholar
Johnston WE. High-speed, wide area, data intensive computing: a ten year retrospective. In: Proceedings of the 7th IEEE international symposium on high performance distributed computing: IEEE Computer Society; 1998.
Google Scholar
Skillicorn DB, Talia D. Models and languages for parallel computation. ACM Comput Surv. 1998;30(2):123–69.
Article Google Scholar
Dowd K, Severance C. High performance computing. Sebastopol: O’Reilly and Associates Inc.; 1998.
Google Scholar
Abbas A. Grid computing: a practical guide to technology and applications. Hingham: Charles River Media Inc; 2004.
Google Scholar
Gokhale M, Cohen J, Yoo A, Miller WM. Hardware technologies for high-performance data-intensive computing. IEEE Comput. 2008;41(4):60–8.
Article Google Scholar
Nyland LS, Prins JF, Goldberg A, Mills PH. A design methodology for data-parallel applications. IEEE Trans Softw Eng. 2000;26(4):293–314.
Article Google Scholar
Agichtein E, Ganti V. Mining reference tables for automatic text segmentation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, USA; 2004. p. 20–9.
Google Scholar
Agichtein E. Scaling information extraction to large document collections: Microsoft Research. 2004.
Google Scholar
Rencuzogullari U, Dwarkadas S. Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. In: Proceedings of the eighth ACM SIGPLAN symposium on principles and practices of parallel programming, Snowbird, UT; 2001. p. 72–81.
Google Scholar
Cerf VG. An information avalanche. IEEE Comput. 2007;40(1):104–5.
Article Google Scholar
Gantz JF, Reinsel D, Chute C, Schlichting W, McArthur J, Minton S, et al. The expanding digital universe (White Paper): IDC. 2007.
Google Scholar
Lyman P, Varian HR. How much information? 2003 (Research Report). School of Information Management and Systems, University of California at Berkeley; 2003.
Google Scholar
Berman F. Got data? A guide to data preservation in the information age. Commun ACM. 2008;51(12):50–6.
Article Google Scholar
NSF. Data-intensive computing. National Science Foundation. 2009. http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS. Retrieved 10 Aug 2009.
PNNL. Data intensive computing. Pacific Northwest National Laboratory. 2008. http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt. Retrieved 10 Aug 2009.
Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I. Cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst. 2009;25(6):599–616.
Article Google Scholar
Gray J. Distributed computing economics. ACM Queue. 2008;6(3):63–8.
Article Google Scholar
Bryant RE. Data intensive scalable computing. Carnegie Mellon University. 2008. http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt. Retrieved 10 Aug 2009.
Middleton AM. Data-intensive computing solutions (Whitepaper): LexisNexis. 2009.
Google Scholar
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the sixth symposium on operating system design and implementation (OSDI); 2004.
Google Scholar
Dean J, Ghemawat S. Mapreduce: a flexible data processing tool. Commun ACM. 2010;53(1):72–7.
Article Google Scholar
Pike R, Dorward S, Griesemer R, Quinlan S. Interpreting the data: parallel analysis with sawzall. Sci Program J. 2004;13(4):227–98.
Google Scholar
White T. Hadoop: the definitive guide. 1st ed. Sebastopol: O’Reilly Media Inc; 2009.
Google Scholar
Gates AF, Natkovich O, Chopra S, Kamath P, Narayanamurthy SM, Olston C, et al. Building a high-level dataflow system on top of map-reduce: the pig experience. In: Proceedings of the 35th international conference on very large databases (VLDB 2009), Lyon, France; 2009.
Google Scholar
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig latin: a not-so_foreign language for data processing. In: Proceedings of the 28th ACM SIGMOD/PODS international conference on management of data/principles of database systems, Vancouver, BC, Canada; 2008. p. 1099–110.
Google Scholar
Bayliss DA. Enterrprise control language overview (Whitepaper): LesisNexis. 2010b.
Google Scholar
Bayliss DA. Thinking declaratively (Whitepaper). 2010c.
Google Scholar
Hellerstein JM. The declarative imperative. SIGMOD Rec. 2010;39(1):5–19.
Article Google Scholar
O’Malley O. Introduction to hadoop. 2008. http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/YahooHadoopIntro-apachecon-us-2008.pdf. Retrieved 10 Aug 2009.
Bayliss DA. Aggregated data analysis: the paradigm shift (Whitepaper): LexisNexis. 2010a.
Google Scholar
Buyya R. High performance cluster computing. Upper Saddle River: Prentice Hall; 1999.
Google Scholar
Chaiken R, Jenkins B, Larson P-A, Ramsey B, Shakib D, Weaver S, et al. Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow. 2008;1:1265–76.
Article Google Scholar
Grossman R, Gu Y. Data mining using high performance data clouds: experimental studies using sector and sphere. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, Nevada, USA; 2008.
Google Scholar
Grossman RL, Gu Y, Sabala M, Zhang W. Compute and storage clouds using wide area high performance networks. Future Gener Comput Syst. 2009;25(2):179–83.
Article Google Scholar
Gu Y, Grossman RL. Lessons learned from a year’s worth of benchmarks of large data clouds. In: Proceedings of the 2nd workshop on many-task computing on grids and supercomputers, Portland, Oregon; 2009.
Google Scholar
Liu H, Orban D. Gridbatch: cloud computing for large-scale data-intensive batch applications. In: Proceedings of the eighth IEEE international symposium on cluster computing and the grid; 2008. p. 295–305.
Google Scholar
Llor X, Acs B, Auvil LS, Capitanu B, Welge ME, Goldberg DE. Meandre: semantic-driven data-intensive flows in the clouds. In: Proceedings of the fourth IEEE international conference on eScience; 2008. p. 238–245.
Google Scholar
Pavlo A, Paulson E, Rasin A, Abadi DJ, Dewitt DJ, Madden S, et al. A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD international conference on management of data, Providence, RI; 2009. p. 165–68.
Google Scholar
Ravichandran D, Pantel P, Hovy E. The terascale challenge. In: Proceedings of the KDD workshop on mining for and from the semantic web; 2004.
Google Scholar
Yu Y, Gunda PK, Isard M. Distributed aggregation for data-parallel computing: interfaces and implementations. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles, Big Sky, Montana, USA; 2009. p. 247–60.
Google Scholar

Download references

Author information

Authors and Affiliations

LexisNexis Risk Solutions, Alpharetta, GA, USA
Anthony M. Middleton, David Alan Bayliss, Gavin Halliday & Arjuna Chala
Florida Atlantic University, Alpharetta, GA, USA
Borko Furht

Authors

Anthony M. Middleton
View author publications
You can also search for this author in PubMed Google Scholar
David Alan Bayliss
View author publications
You can also search for this author in PubMed Google Scholar
Gavin Halliday
View author publications
You can also search for this author in PubMed Google Scholar
Arjuna Chala
View author publications
You can also search for this author in PubMed Google Scholar
Borko Furht
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Middleton, A.M., Bayliss, D.A., Halliday, G., Chala, A., Furht, B. (2016). The HPCC/ECL Platform for Big Data. In: Big Data Technologies and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-44550-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-44550-2_6
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44548-9
Online ISBN: 978-3-319-44550-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics