Skip to main content

Data-Intensive Technologies for Cloud Computing

  • Chapter
  • First Online:

Abstract

As a result of the continuing information explosion, many organizations are drowning in data and the resulting “data gap” or inability to process this information and use it effectively is increasing at an alarming rate. Data-intensive computing represents a new computing paradigm (Kouzes, Anderson, Elbert, Gorton, & Gracio, 2009) which can address the data gap using scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement applications previously thought to be impractical or infeasible. Cloud computing provides the opportunity for organizations with limited internal resources to implement large-scale data-intensive computing applications in a cost-effective manner.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Abbas, A. (2004). Grid computing: A practical guide to technology and applications. Hingham, MA: Charles River Media.

    Google Scholar 

  • Agichtein, E. (2005). Scaling information extraction to large document collections. IEEE Data Engineering Bulletin, 28, 3–10.

    Google Scholar 

  • Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, 20–29.

    Google Scholar 

  • Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., et al. (2009). Above the clouds: A Berkeley view of cloud computing (University of California at Berkely, Tech. Rep. UCB/EECS-2009-28).

    Google Scholar 

  • Berman, F. (2008). Got data? A guide to data preservation in the information age. Communications of the ACM, 51(12), 50–56.

    Article  Google Scholar 

  • Borthakur, D. (2008). Hadoop distributed file system. Available from: http://www.opendocs.net/apache/hadoop/HDFSDescription.pdf.

  • Bryant, R. E. (2008). Data intensive scalable computing. Retrieved January 5, 2010, from: http://www.cs.cmu.edu/∼bryant/presentations/DISC-concept.ppt.

    Google Scholar 

  • Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6), 599–616.

    Article  Google Scholar 

  • Cerf, V. G. (2007). An information avalanche. IEEE Computer, 40(1), 104–105.

    Article  Google Scholar 

  • Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B., Shakib, D., Weaver, S., et al. (2008). SCOPE: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, New York, NY.

    Google Scholar 

  • Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., et al. (2006). Bigtable: A distributed storage system for structured data. Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06), Seattle, WA.

    Google Scholar 

  • Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters. Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI), Boston, MA.

    Google Scholar 

  • Gantz, J. F., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., et al. (2007). The expanding digital universe. IDC, White Paper.

    Google Scholar 

  • Gates, A. F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S. M., Olston, C., et al. (2009). Building a high-level dataflow system on top of map-reduce: The pig experience. Proceedings of the 35th International Conference on Very Large Databases (VLDB 2009), Lyon, France.

    Google Scholar 

  • Gokhale, M., Cohen, J., Yoo, A., & Miller, W. M. (2008). Hardware technologies for high-performance data-intensive computing. IEEE Computer, 41(4), 60–68.

    Article  Google Scholar 

  • Gorton, I., Greenfield, P., Szalay, A., & Williams, R. (2008). Data-intensive computing in the 21st century. IEEE Computer, 41(4), 30–32.

    Article  Google Scholar 

  • Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The google file system. Proceedings of the 19th ACM Symposium on Operating Systems Principles, New York, NY.

    Google Scholar 

  • Gray, J. (2008). Distributed computing economics. ACM Queue, 6(3), 63–68.

    Article  Google Scholar 

  • Grossman, R. L. (2009). The case for cloud computing. IT Professional,11(2), 23–27.

    Article  Google Scholar 

  • Grossman, R., & Gu, Y. (2008). Data mining using high performance data clouds: Experimental studies using sector and sphere. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY.

    Google Scholar 

  • Grossman, R. L., & Gu, Y. (2009). On the varieties of clouds for data intensive computing. Available from: http://sites.computer.org/debull/A09mar/grossman.pdf, 2009.

  • Grossman, R. L., Gu, Y., Sabala, M., & Zhang, W. (2009). Compute and storage clouds using wide area high performance networks. Future Generation Computer Systems, 25(2), 179–183.

    Article  Google Scholar 

  • Gu, Y., & Grossman, R. L. (2009). Lessons learned from a year’s worth of benchmarks of large data clouds. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Portland, OR.

    Google Scholar 

  • Hayes, B. (2008). Cloud computing. Communications of the ACM, 51(7), 9–11.

    Article  Google Scholar 

  • Johnston,W. E. (1998). High-speed, wide area, data intensive computing: A ten year retrospective. Proceedings of the 7th IEEE International Symposium on High-Performance Distributed Computing. Chicago, Illinois, 280.

    Google Scholar 

  • Kouzes, R. T., Anderson, G. A., Elbert, S. T., Gorton, I., & Gracio, D. K. (2009). The changing paradigm of data-intensive computing. Computer, 42(1), 26–34.

    Article  Google Scholar 

  • Lenk, A., Klems, M., Nimis, J., Tai, S., & Sandholm, T. (2009). What’s inside the cloud? An architectural map of the cloud landscape. Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing. Vancouver, Canada, 23–31.

    Google Scholar 

  • Levitt, N. (2009). Is cloud computing really ready for prime time? Computer, 42(1), 15–20.

    Article  Google Scholar 

  • Liu, H., & Orban, D. (2008). GridBatch: Cloud computing for large-scale data-intensive batch applications. Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid, Cardiff.

    Google Scholar 

  • Llor, X., Acs, B., Auvil, L. S., Capitanu, B., Welge, M. E., & Goldberg, D. E. (2008). Meandre: Semantic-driven data-intensive flows in the clouds. Proceedings of the 4th IEEE International Conference on eScience, Nottingham.

    Google Scholar 

  • Lyman, P., & Varian, H. R. (2003). How much information? (School of Information Management and Systems, University of California at Berkeley, Research Rep.).

    Google Scholar 

  • Mell, P., & Grance, T. (2009). The NIST definition of cloud computing. Retrieved January 5, 2010, from: http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc.

  • Napper, J., & Bientinesi, P. (2009). Can cloud computing reach the Top500?. Conference On Computing Frontiers. Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, Ischia, Italy.

    Google Scholar 

  • Nicosia, M. (2009). Hadoop cluster management. Retrieved January 5, 2010, from: http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/Hadoop-USENIX09.pdf.

  • Nyland, L. S., Prins, J. F., Goldberg, A., & Mills, P. H. (2000). A design methodology for data-parallel applications. IEEE Transactions on Software Engineering, 26(4), 293–314.

    Article  Google Scholar 

  • NSF. (2009). Data-intensive computing. Retrieved January 5, 2010, from: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS.

  • O’Malley, O. (2008). Introduction to Hadoop. Available from: http://wiki.apache.org/hadoop/HadoopPresentations/attachments/YahooHadoopIntro-apachecon-us-2008.pdf.

  • O’Malley, O., & Murthy, A. C. (2009). Winning a 60 second dash with a yellow elephant. Retrieved January 5, 2010, from: http://sortbenchmark.org/Yahoo2009.pdf.

  • Olston, C. (2009). Pig overview presentation – Hadoop summit. Retrieved January 5, 2010, from: http://infolab.stanford.edu/∼olston/pig.pdf.

    Google Scholar 

  • Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008a). Pig Latin: A not-so-foreign language for data processing (Presentation at SIGMOD 2008). Retrieved January 5, 2010, from: http://i.stanford.edu/∼usriv/talks/sigmod08-pig-latin.ppt#283,18,User-Code as a First-Class Citizen.

    Google Scholar 

  • Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008b). Pig Latin: A not-so_foreign language for data processing. Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data/Principles of Database Systems, Vancouver, BC.

    Google Scholar 

  • Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., Dewitt, D. J., Madden, S., et al. (2009). A comparison of approaches to large-scale data analysis. Proceedings of the 35th SIGMOD International Conference on Management of Data, New York, NY.

    Google Scholar 

  • PNNL. (2008). Data intensive computing. Retrieved January 5, 2010, from: http://www.cs.cmu.edu/∼bryant/presentations/DISC-concept.ppt.

    Google Scholar 

  • Pike, R., Dorward, S., Griesemer, R., & Quinlan, S. (2004). Interpreting the data: Parallel analysis with Sawzall. Scientific Programming Journal, 13(4), 227–298.

    Google Scholar 

  • Ravichandran, D., Pantel, P., & Hovy, E. (2004). The terascale challenge. Proceedings of the KDD Workshop on Mining for and from the Semantic Web, Boston, MA.

    Google Scholar 

  • Rencuzogullari, U., & Dwarkadas, S. (2001). Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, San Diego, CA, 72–81.

    Google Scholar 

  • Reese, G. (2009). Cloud application architectures. Sebastopol, CA: O’Reilly.

    Google Scholar 

  • Skillicorn, D. B., & Talia, D. (1998). Models and languages for parallel computation. ACM Computing Surveys, 30(2), 123–169.

    Article  Google Scholar 

  • Vaquero, L. M., Rodero-Merino, L., Caceres, J., & Lindner, M. (2009). A break in the clouds: Towards a cloud definition. SIGCOMM Computer Communication Review, 39(1), 50–55.

    Article  Google Scholar 

  • Velte, A. T., Velte, T. J., & Elsenpeter, R. (2009). Cloud computing: A practical approach. New York, NY: McGraw Hill.

    Google Scholar 

  • Venner, J. (2009). Pro Hadoop. New York, NY: Apress.

    Book  Google Scholar 

  • Viega, J. (2009). Cloud computing and the common man. Computer, 42(8), 106–108.

    Article  Google Scholar 

  • Weiss, A. (2007). Computing in the clouds. netWorker, 11(4), 16–25.

    Article  Google Scholar 

  • White, T. (2008). Understanding map reduce with Hadoop. Available from: http://wiki.apache.org/hadoop/HadoopPresentations.

  • White, T. (2009). Hadoop: The definitive guide. Sebastopol, CA: O’Reilly Media.

    Google Scholar 

  • Yu, Y., Gunda, P. K., & Isard, M. (2009). Distributed aggregation for data-parallel computing: Interfaces and implementations. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, MT.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anthony M. Middleton .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Middleton, A.M. (2010). Data-Intensive Technologies for Cloud Computing. In: Furht, B., Escalante, A. (eds) Handbook of Cloud Computing. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6524-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6524-0_5

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-6523-3

  • Online ISBN: 978-1-4419-6524-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics