Skip to main content

ECL/HPCC: A Unified Approach to Big Data

  • Chapter
  • First Online:
Handbook of Data Intensive Computing

Abstract

As a result of the continuing information explosion, many organizations are experiencing what is now called the “Big Data” problem. This results in the inability of organizations to effectively use massive amounts of their data in datasets which have grown too big to process in a timely manner. Data-intensive computing represents a new computing paradigm [26] which can address the big data problem using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbas, A. (2004). Grid computing: A practical guide to technology and applications. Hingham, MA: Charles River Media, Inc.

    Google Scholar 

  2. Agichtein, E. (2004). Scaling information extraction to large document collections: Microsoft Research.

    Google Scholar 

  3. Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 20–29.

    Google Scholar 

  4. Bayliss, D. A. (2010a). Aggregated data analysis: The paradigm shift (Whitepaper): LexisNexis.

    Google Scholar 

  5. Bayliss, D. A. (2010b). Enterrprise control language overview (Whitepaper): LexisNexis.

    Google Scholar 

  6. Bayliss, D. A. (2010c). Thinking declaratively (Whitepaper).

    Google Scholar 

  7. Berman, F. (2008). Got data? A guide to data preservation in the information age. Communications of the ACM, 51(12), 50–56.

    Article  Google Scholar 

  8. Bryant, R. E. (2008). Data intensive scalable computing. Carnegie Mellon University. Retrieved August 10, 2009, from http://www.cs.cmu.edu/$nsim$bryant/presentations/DISCconcept.ppt

  9. Buyya, R. (1999). High performance cluster computing. Upper Saddle River, NJ: Prentice Hall.

    Google Scholar 

  10. Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009). Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6), 599–616.

    Article  Google Scholar 

  11. Cerf, V. G. (2007). An information avalanche. IEEE Computer, 40(1), 104–105.

    Article  Google Scholar 

  12. Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B., Shakib, D., Weaver, S., et al. (2008). Scope: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, 1, 1265–1276.

    Google Scholar 

  13. Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI).

    Google Scholar 

  14. Dean, J., & Ghemawat, S. (2010). Mapreduce: A flexible data processing tool. Communications of the ACM, 53(1), 72–77.

    Article  Google Scholar 

  15. Dowd, K., & Severance, C. (1998). High performance computing. Sebastopol, CA: O’Reilly and Associates, Inc.

    Google Scholar 

  16. Gantz, J. F., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., et al. (2007). The expanding digital universe (White Paper): IDC.

    Google Scholar 

  17. Gates, A. F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S. M., Olston, C., et al. (2009, Aug 24–28). Building a high-level dataflow system on top of map-reduce: The pig experience. Proceedings of the 35th International Conference on Very Large Databases (VLDB 2009), Lyon, France.

    Google Scholar 

  18. Gokhale, M., Cohen, J., Yoo, A., & Miller, W. M. (2008). Hardware technologies for high-performance data-intensive computing. IEEE Computer, 41(4), 60–68.

    Article  Google Scholar 

  19. Gorton, I., Greenfield, P., Szalay, A., & Williams, R. (2008). Data-intensive computing in the 21st century. IEEE Computer, 41(4), 30–32.

    Article  Google Scholar 

  20. Gray, J. (2008). Distributed computing economics. ACM Queue, 6(3), 63–68.

    Article  Google Scholar 

  21. Grossman, R., & Gu, Y. (2008). Data mining using high performance data clouds: Experimental studies using sector and sphere. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA.

    Google Scholar 

  22. Grossman, R. L., Gu, Y., Sabala, M., & Zhang, W. (2009). Compute and storage clouds using wide area high performance networks. Future Generation Computer Systems, 25(2), 179–183.

    Article  Google Scholar 

  23. Gu, Y., & Grossman, R. L. (2009). Lessons learned from a year’s worth of benchmarks of large data clouds. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Portland, Oregon.

    Google Scholar 

  24. Hellerstein, J. M. (2010). The declarative imperative. SIGMOD Record, 39(1), 5–19.

    Article  Google Scholar 

  25. Johnston, W. E. (1998). High-speed, wide area, data intensive computing: A ten year retrospective, Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing: IEEE Computer Society.

    Google Scholar 

  26. Kouzes, R. T., Anderson, G. A., Elbert, S. T., Gorton, I., & Gracio, D. K. (2009). The changing paradigm of data-intensive computing. Computer, 42(1), 26–34.

    Article  Google Scholar 

  27. Liu, H., & Orban, D. (2008). Gridbatch: Cloud computing for large-scale data-intensive batch applications. Proceedings of the Eighth IEEE International Symposium on Cluster Computing and the Grid, 295–305.

    Google Scholar 

  28. Llor, X., Acs, B., Auvil, L. S., Capitanu, B., Welge, M. E., & Goldberg, D. E. (2008). Meandre: Semantic-driven data-intensive flows in the clouds. Proceedings of the Fourth IEEE International Conference on eScience, 238–245.

    Google Scholar 

  29. Lyman, P., & Varian, H. R. (2003). How much information? 2003 (Research Report): School of Information Management and Systems, University of California at Berkeley.

    Google Scholar 

  30. Middleton, A. M. (2009). Data-intensive computing solutions (Whitepaper): LexisNexis.

    Google Scholar 

  31. NSF. (2009). Data-intensive computing. National Science Foundation. Retrieved August 10, 2009, from http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS

  32. Nyland, L. S., Prins, J. F., Goldberg, A., & Mills, P. H. (2000). A design methodology for data-parallel applications. IEEE Transactions on Software Engineering, 26(4), 293–314.

    Article  Google Scholar 

  33. O’Malley, O. (2008). Introduction to hadoop. Retrieved August 10, 2009, from http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/YahooHadoopIntro-apachecon-us-2008.pdf

  34. Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008, June 9–12). Pig latin: A not-so_foreign language for data processing. Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data/Principles of Database Systems, Vancouver, BC, Canada, 1099–1110.

    Google Scholar 

  35. Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., Dewitt, D. J., Madden, S., et al. (2009, June 29–July 2). A comparison of approaches to large-scale data analysis. Proceedings of the 35th SIGMOD international conference on Management of data, Providence, RI, 165–168.

    Google Scholar 

  36. Pike, R., Dorward, S., Griesemer, R., & Quinlan, S. (2004). Interpreting the data: Parallel analysis with sawzall. Scientific Programming Journal, 13(4), 227–298.

    Google Scholar 

  37. PNNL. (2008). Data intensive computing. Pacific Northwest National Laboratory. Retrieved August 10, 2009, from http://www.cs.cmu.edu/$nsim$bryant/presentations/DISC-concept.ppt

  38. Ravichandran, D., Pantel, P., & Hovy, E. (2004). The terascale challenge. Proceedings of the KDD Workshop on Mining for and from the Semantic Web.

    Google Scholar 

  39. Rencuzogullari, U., & Dwarkadas, S. (2001). Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, Snowbird, UT, 72–81.

    Google Scholar 

  40. Skillicorn, D. B., & Talia, D. (1998). Models and languages for parallel computation. ACM Computing Surveys, 30(2), 123–169.

    Article  Google Scholar 

  41. White, T. (2009). Hadoop: The definitive guide (First ed.). Sebastopol, CA: O’Reilly Media Inc.

    Google Scholar 

  42. Yu, Y., Gunda, P. K., & Isard, M. (2009). Distributed aggregation for data-parallel computing: Interfaces and implementations. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, Montana, USA, 247–260.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anthony M. Middleton .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Middleton, A.M., Bayliss, D.A., Halliday, G. (2011). ECL/HPCC: A Unified Approach to Big Data. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1415-5_3

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1414-8

  • Online ISBN: 978-1-4614-1415-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics