Encyclopedia of Big Data Technologies

2019 Edition
| Editors: Sherif Sakr, Albert Y. Zomaya

Cheap Data Analytics on Cold Storage

  • Raja AppuswamyEmail author
Reference work entry
DOI: https://doi.org/10.1007/978-3-319-77525-8_147

Abstract

Driven the promise of analytics, enterprises have started accumulating vast amounts of data with the goal of deriving actionable insights. However, not all data stored is accessed uniformly. Recent studies predict that as much as 80% of enterprise data is cold and that cold data is the fastest-growing segment of enterprise data with a 60% cumulative annual growth rate. This rapid proliferation of cold data has in turn driven the demand for new cold storage infrastructures that can reduce the cost of data storage while providing in-line access to data for running batch analytic workflows. This chapter surveys recent trends in both the hardware and software landscape that enables cheap data analytics over such cold storage infrastructures.

This is a preview of subscription content, log in to check access.

References

  1. Amazon (2015) Amazon simple storage service. https://aws.amazon.com/s3/. Accessed 1 Oct 2017
  2. Appuswamy R, Borovica-Gajic R, Graefe G, Ailamaki A (2017) The five-minute rule thirty years later, and its impact on the storage hierarchy. In: Proceedings of the eighth international workshop on accelerating analytics and data management systems using modern processor and storage architectures, MunichGoogle Scholar
  3. Balakrishnan S, Black R, Donnelly A, England P, Glass A, Harper D, Legtchenko S, Ogus A, Peterson E, Rowstron A (2014) Pelican: a building block for exascale cold data storage. In: Proceedings of the 11th USENIX conference on operating systems design and implementation, Berkeley, pp 351–365Google Scholar
  4. Bandaru K, Patiejunas K (2015) Under the hood: Facebooks cold storage system. Facebook. https://code.facebook.com/posts/1433093613662262/-under-the-hood-facebook-s-cold-storage-system-/. Accessed 1 Oct 2017
  5. Bhat S (2016) Introducing azure cool blob storage. Microsoft. https://azure.microsoft.com/en-us/blog/introducing-azure-cool-storage/. Accessed 1 Oct 2017
  6. Borovica-Gajic R, Appuswamy R, Ailamaki A (2016) Cheap data analytics using cold storage devices. Proc VLDB Endow 9(12):1029–1040. https://doi.org/10.14778/2994509.2994521CrossRefGoogle Scholar
  7. Colarelli D, Grunwald D (2002) Massive arrays of idle disks for storage archives. In: Proceedings of the ACM/IEEE conference on supercomputing, Los Alamitos, pp 1–11Google Scholar
  8. Deshpande A, Ives Z, Raman V (2007) Adaptive query processing. J Found Trends Databases 1(1):1–140. https://doi.org/10.1561/1900000001zbMATHCrossRefGoogle Scholar
  9. EMC (2014) The digital universe of opportunities: rich data and the increasing value of the internet of things. https://www.emc.com/collateral/analyst-reports/idc-digital-universe-2014.pdf. Accessed 1 Oct 2017
  10. Fontana R, Decad G (2015) Roadmaps and technology reality. Presented at the library of congress storage meetings on designing storage architectures for digital collections, Washington, 9–10 Sept 2015Google Scholar
  11. Google (2017) Nearline cloud storage. https://cloud.google.com/storage/archival/. Cited 1 Oct 2017
  12. Gray J, Graefe G (1997) The five-minute rule ten years later, and other computer storage rules of thumb. SIGMOD Rec 26(4):63–68. https://doi.org/10.1145/271074.271094CrossRefGoogle Scholar
  13. Gray J, Graefe G (2007) The five-minute rule twenty years later and how flash memory changes the rules. In: Proceedings of the 3rd international workshop on data management on new hardware, New York, pp 1–9Google Scholar
  14. Gray J, Putzolu GR (1987) The 5-minute rule for trading memory for disk accesses and the 10-byte rule for trading memory for CPU time. SIGMOD Rec 16(3): 395–398. https://doi.org/10.1145/38714.38755CrossRefGoogle Scholar
  15. Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M, Pellow F, Pirahesh H (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. J Data Min Knowl Discov 1(1):29–53. https://doi.org/10.1023/A:1009726021843CrossRefGoogle Scholar
  16. Kathpal A, Yasa GAN (2014) Nakshatra: towards running batch analytics on an archive. In: Proceedings of 22nd IEEE international symposium on the modelling, analysis and simulation of computer and telecommunication systems, Paris, pp 479–482Google Scholar
  17. Lee J, Ahn J, Park C, Kim J (2016) DTStorage: dynamic tape-based storage for cost-effective and highly-available streaming service. In: Proceedings of the 16th IEEE/ACM international symposium on cluster, cloud and grid computing, Cartagena, pp 376–387Google Scholar
  18. LTO Ultrium (2015) LTO roadmap. http://www.ltoultrium.com/lto-ultrium-roadmap/. Accessed 1 Oct 2017
  19. Mendoza A (2013) Cold storage in the cloud: trends, challenges, and solutions. https://www.intel.com/content/www/us/en/storage/cold-storage-atom-xeon-paper.html. Accessed 1 Oct 2017
  20. Moore F (2015) Tiered storage takes center stage. Horison Inc. http://horison.com/publications/tiered-storage-takes-center-stage. Accessed 1 Oct 2017
  21. Moore F (2016) Storage outlook 2016. Horison Inc. https://horison.com/publications/storage-outlook-2016. Accessed 1 Oct 2017
  22. Nandkarni A (2014) IDC worldwide cold storage taxonomy. http://www.idc.com/getdoc.jsp?containerId =246732. Accessed 1 Oct 2017
  23. Oracle (2015) OpenStack swift interface for oracle hierarchical storage manager. http://www.oracle.com/us/products/servers-storage/storage/storage-software/solution-brief-sam-swift-2321869.pdf. Accessed 1 Oct 2017
  24. Prabhakar S, Agrawal D, Abbadi A (2003) Optimal scheduling algorithms for tertiary storage. J Distrib Parallel Databases 14(3):255–282. https://doi.org/10.1023/A:1025589332623CrossRefGoogle Scholar
  25. Reddy R, Kathpal A, Basak J, Katz R (2015) Data layout for power efficient archival storage systems. In: Proceedings of the workshop on power-aware computing and systems, Monterey, pp 16–20Google Scholar
  26. Robert Y, Vivien F (2009) Introduction to scheduling, 1st edn. CRC Press, Boca RatonzbMATHCrossRefGoogle Scholar
  27. Sarawagi S, Stonebraker M (1996) Reordering query execution in tertiary memory databases. In: Proceedings of the 22th international conference on very large data bases, San Francisco, pp 156–167Google Scholar
  28. Spectra Logic (2013) Spectra arcticblue overview. https://www.spectralogic.com/products/arcticblue/. Accessed 1 Oct 2017
  29. Viglas SD, Naughton JF, Burger F (2003) Maximizing the output rate of multi-way join queries over streaming information sources. In: Proceedings of the 29th international conference on very large data bases, Berlin, pp 285–296Google Scholar
  30. Yan W, Yao J, Cao Q, Xie C, Jiang H (2017) ROS: a rack-based optical storage system with inline accessibility for long-term data preservation. In: Proceedings of the 12th European conference on computer systems, Belgrade, pp 161–174Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Data Science DepartmentEURECOMBiotFrance