Data-Intensive Technologies for Cloud Computing

Middleton, Anthony M.

doi:10.1007/978-1-4419-6524-0_5

Data-Intensive Technologies for Cloud Computing

Anthony M. Middleton³

Chapter
First Online: 01 January 2010

7212 Accesses
15 Citations

Abstract

As a result of the continuing information explosion, many organizations are drowning in data and the resulting “data gap” or inability to process this information and use it effectively is increasing at an alarming rate. Data-intensive computing represents a new computing paradigm (Kouzes, Anderson, Elbert, Gorton, & Gracio, 2009) which can address the data gap using scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement applications previously thought to be impractical or infeasible. Cloud computing provides the opportunity for organizations with limited internal resources to implement large-scale data-intensive computing applications in a cost-effective manner.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abbas, A. (2004). Grid computing: A practical guide to technology and applications. Hingham, MA: Charles River Media.
Google Scholar
Agichtein, E. (2005). Scaling information extraction to large document collections. IEEE Data Engineering Bulletin, 28, 3–10.
Google Scholar
Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, 20–29.
Google Scholar
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., et al. (2009). Above the clouds: A Berkeley view of cloud computing (University of California at Berkely, Tech. Rep. UCB/EECS-2009-28).
Google Scholar
Berman, F. (2008). Got data? A guide to data preservation in the information age. Communications of the ACM, 51(12), 50–56.
Article Google Scholar
Borthakur, D. (2008). Hadoop distributed file system. Available from: http://www.opendocs.net/apache/hadoop/HDFSDescription.pdf.
Bryant, R. E. (2008). Data intensive scalable computing. Retrieved January 5, 2010, from: http://www.cs.cmu.edu/∼bryant/presentations/DISC-concept.ppt.
Google Scholar
Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6), 599–616.
Article Google Scholar
Cerf, V. G. (2007). An information avalanche. IEEE Computer, 40(1), 104–105.
Article Google Scholar
Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B., Shakib, D., Weaver, S., et al. (2008). SCOPE: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, New York, NY.
Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., et al. (2006). Bigtable: A distributed storage system for structured data. Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06), Seattle, WA.
Google Scholar
Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters. Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI), Boston, MA.
Google Scholar
Gantz, J. F., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., et al. (2007). The expanding digital universe. IDC, White Paper.
Google Scholar
Gates, A. F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S. M., Olston, C., et al. (2009). Building a high-level dataflow system on top of map-reduce: The pig experience. Proceedings of the 35th International Conference on Very Large Databases (VLDB 2009), Lyon, France.
Google Scholar
Gokhale, M., Cohen, J., Yoo, A., & Miller, W. M. (2008). Hardware technologies for high-performance data-intensive computing. IEEE Computer, 41(4), 60–68.
Article Google Scholar
Gorton, I., Greenfield, P., Szalay, A., & Williams, R. (2008). Data-intensive computing in the 21st century. IEEE Computer, 41(4), 30–32.
Article Google Scholar
Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The google file system. Proceedings of the 19th ACM Symposium on Operating Systems Principles, New York, NY.
Google Scholar
Gray, J. (2008). Distributed computing economics. ACM Queue, 6(3), 63–68.
Article Google Scholar
Grossman, R. L. (2009). The case for cloud computing. IT Professional,11(2), 23–27.
Article Google Scholar
Grossman, R., & Gu, Y. (2008). Data mining using high performance data clouds: Experimental studies using sector and sphere. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY.
Google Scholar
Grossman, R. L., & Gu, Y. (2009). On the varieties of clouds for data intensive computing. Available from: http://sites.computer.org/debull/A09mar/grossman.pdf, 2009.
Grossman, R. L., Gu, Y., Sabala, M., & Zhang, W. (2009). Compute and storage clouds using wide area high performance networks. Future Generation Computer Systems, 25(2), 179–183.
Article Google Scholar
Gu, Y., & Grossman, R. L. (2009). Lessons learned from a year’s worth of benchmarks of large data clouds. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Portland, OR.
Google Scholar
Hayes, B. (2008). Cloud computing. Communications of the ACM, 51(7), 9–11.
Article Google Scholar
Johnston,W. E. (1998). High-speed, wide area, data intensive computing: A ten year retrospective. Proceedings of the 7th IEEE International Symposium on High-Performance Distributed Computing. Chicago, Illinois, 280.
Google Scholar
Kouzes, R. T., Anderson, G. A., Elbert, S. T., Gorton, I., & Gracio, D. K. (2009). The changing paradigm of data-intensive computing. Computer, 42(1), 26–34.
Article Google Scholar
Lenk, A., Klems, M., Nimis, J., Tai, S., & Sandholm, T. (2009). What’s inside the cloud? An architectural map of the cloud landscape. Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing. Vancouver, Canada, 23–31.
Google Scholar
Levitt, N. (2009). Is cloud computing really ready for prime time? Computer, 42(1), 15–20.
Article Google Scholar
Liu, H., & Orban, D. (2008). GridBatch: Cloud computing for large-scale data-intensive batch applications. Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid, Cardiff.
Google Scholar
Llor, X., Acs, B., Auvil, L. S., Capitanu, B., Welge, M. E., & Goldberg, D. E. (2008). Meandre: Semantic-driven data-intensive flows in the clouds. Proceedings of the 4th IEEE International Conference on eScience, Nottingham.
Google Scholar
Lyman, P., & Varian, H. R. (2003). How much information? (School of Information Management and Systems, University of California at Berkeley, Research Rep.).
Google Scholar
Mell, P., & Grance, T. (2009). The NIST definition of cloud computing. Retrieved January 5, 2010, from: http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc.
Napper, J., & Bientinesi, P. (2009). Can cloud computing reach the Top500?. Conference On Computing Frontiers. Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, Ischia, Italy.
Google Scholar
Nicosia, M. (2009). Hadoop cluster management. Retrieved January 5, 2010, from: http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/Hadoop-USENIX09.pdf.
Nyland, L. S., Prins, J. F., Goldberg, A., & Mills, P. H. (2000). A design methodology for data-parallel applications. IEEE Transactions on Software Engineering, 26(4), 293–314.
Article Google Scholar
NSF. (2009). Data-intensive computing. Retrieved January 5, 2010, from: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS.
O’Malley, O. (2008). Introduction to Hadoop. Available from: http://wiki.apache.org/hadoop/HadoopPresentations/attachments/YahooHadoopIntro-apachecon-us-2008.pdf.
O’Malley, O., & Murthy, A. C. (2009). Winning a 60 second dash with a yellow elephant. Retrieved January 5, 2010, from: http://sortbenchmark.org/Yahoo2009.pdf.
Olston, C. (2009). Pig overview presentation – Hadoop summit. Retrieved January 5, 2010, from: http://infolab.stanford.edu/∼olston/pig.pdf.
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008a). Pig Latin: A not-so-foreign language for data processing (Presentation at SIGMOD 2008). Retrieved January 5, 2010, from: http://i.stanford.edu/∼usriv/talks/sigmod08-pig-latin.ppt#283,18,User-Code as a First-Class Citizen.
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008b). Pig Latin: A not-so_foreign language for data processing. Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data/Principles of Database Systems, Vancouver, BC.
Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., Dewitt, D. J., Madden, S., et al. (2009). A comparison of approaches to large-scale data analysis. Proceedings of the 35th SIGMOD International Conference on Management of Data, New York, NY.
Google Scholar
PNNL. (2008). Data intensive computing. Retrieved January 5, 2010, from: http://www.cs.cmu.edu/∼bryant/presentations/DISC-concept.ppt.
Google Scholar
Pike, R., Dorward, S., Griesemer, R., & Quinlan, S. (2004). Interpreting the data: Parallel analysis with Sawzall. Scientific Programming Journal, 13(4), 227–298.
Google Scholar
Ravichandran, D., Pantel, P., & Hovy, E. (2004). The terascale challenge. Proceedings of the KDD Workshop on Mining for and from the Semantic Web, Boston, MA.
Google Scholar
Rencuzogullari, U., & Dwarkadas, S. (2001). Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, San Diego, CA, 72–81.
Google Scholar
Reese, G. (2009). Cloud application architectures. Sebastopol, CA: O’Reilly.
Google Scholar
Skillicorn, D. B., & Talia, D. (1998). Models and languages for parallel computation. ACM Computing Surveys, 30(2), 123–169.
Article Google Scholar
Vaquero, L. M., Rodero-Merino, L., Caceres, J., & Lindner, M. (2009). A break in the clouds: Towards a cloud definition. SIGCOMM Computer Communication Review, 39(1), 50–55.
Article Google Scholar
Velte, A. T., Velte, T. J., & Elsenpeter, R. (2009). Cloud computing: A practical approach. New York, NY: McGraw Hill.
Google Scholar
Venner, J. (2009). Pro Hadoop. New York, NY: Apress.
Book Google Scholar
Viega, J. (2009). Cloud computing and the common man. Computer, 42(8), 106–108.
Article Google Scholar
Weiss, A. (2007). Computing in the clouds. netWorker, 11(4), 16–25.
Article Google Scholar
White, T. (2008). Understanding map reduce with Hadoop. Available from: http://wiki.apache.org/hadoop/HadoopPresentations.
White, T. (2009). Hadoop: The definitive guide. Sebastopol, CA: O’Reilly Media.
Google Scholar
Yu, Y., Gunda, P. K., & Isard, M. (2009). Distributed aggregation for data-parallel computing: Interfaces and implementations. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, MT.
Google Scholar

Download references

Author information

Authors and Affiliations

LexisNexis Risk Solutions, Boca Raton, FL, USA
Anthony M. Middleton

Authors

Anthony M. Middleton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony M. Middleton .

Editor information

Editors and Affiliations

, Dept. of Comp. & Elect. Engin. and, Florida Atlantic University, Glades Road 777, Boca Raton, 33431, Florida, USA
Borko Furht
LexisNexis, Park of Commerce Blvd 6601, Boca Raton, 33487, Florida, USA
Armando Escalante

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Middleton, A.M. (2010). Data-Intensive Technologies for Cloud Computing. In: Furht, B., Escalante, A. (eds) Handbook of Cloud Computing. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6524-0_5

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6524-0_5
Published: 27 August 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6523-3
Online ISBN: 978-1-4419-6524-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics