Advertisement

Storage Infrastructures for High-Performance Big Data Analytics

  • Pethuru Raj
  • Anupama Raman
  • Dhivya Nagaraj
  • Siddhartha Duggirala
Part of the Computer Communications and Networks book series (CCN)

Abstract

The proliferation of machine-to-machine (M2M) communication is exponentially increasing the amount of unstructured data that is generated in the digital universe. It is estimated that Facebook generates roughly about 1 TB of data every day, and most of it is unstructured data. The storage infrastructures like storage area networks (SANs) and network-attached storage (NAS) are not designed to store and process unstructured data. Hence, it is the need of the day to design storage devices and networks which are robust enough to scale and accommodate huge amount of unstructured data without causing any performance impact. However, any present-day storage infrastructure that is designed to handle big data uses the traditional storage technologies like NAS and SAN as their underlying foundation. Hence, it is impossible to understand big data storage platforms without proper understanding of underlying technologies. In this chapter, we are explaining the foundations of storage technologies in the beginning, and later on, we are doing a deep dive to understand how their design has been transformed in such a way that they can store and process big data. In most of the scenarios, multiple storage platforms are combined, and then some kind of enhancement is added to make them compatible to handle big data. In the first half of the chapter, we are examining the storage technologies like direct-attached storage (DAS), NAS, and SAN. We are also analyzing their suitability for processing big data. In the second half of the chapter, we are focusing on the latest storage technologies which have been designed and optimized for big data processing like Panasas file system, Luster file system, GFS, and HDFS.

Keywords

Big data Storage Cloud Hadoop Storage area network Network-attached storage Fiber Channel Object-based storage Panasas 

References

  1. 1.
    Advantages of cloud data storage (2013) Retrieved from Borthakur D (2007) Architecture of HDFS. Retrieved from http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf
  2. 2.
    Zhang Jian-Hua, Zhang Nan (2011) Cloud computing-based data storage and disaster recovery. In: International conference on future computer science and education, pp. 629–632. doi:http://doi.ieeecomputersociety.org/10.1109/ICFCSE.2011.157

Further Reading

  1. Connel M (2013) Object storage systems: the underpinning of cloud and big-data initiatives. Retrieved from http://www.snia.org/sites/default/education/tutorials/2013/spring/stor/MarkOConnell_Object_Storage_As_Cloud_Foundation.pdf
  2. Davenport TH, Siegel E (2013) Predictive analytics: the power to predict who will click, but, lie, or die [Hardcover]. Wiley, Hoboken. ISBN-13: 978–1118356852Google Scholar
  3. IBM corporation (2014) What is hadoop? Retrieved from http://www-01.ibm.com/software/data/infosphere/hadoop/
  4. Minelli M, Chambers M, Dhiraj A (2013) Big data, big analytics: emerging business intelligence and analytic trends for today’s businesses [Hardcover]. Wiley (Wiley CIO), HobokenGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Pethuru Raj
    • 1
  • Anupama Raman
    • 1
  • Dhivya Nagaraj
    • 1
  • Siddhartha Duggirala
    • 2
  1. 1.IBM IndiaBangaloreIndia
  2. 2.Indian Institute of TechnologyIndoreIndia

Personalised recommendations