Skip to main content

Big Data Platforms for Data Analytics

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems

Synonyms

Big data management systems; Data intensive computing software; Predictive analytics platforms

Definition

Due to the volume, velocity, and variety of data now coming from the Web, social media, and personal devices, the analysis of “Big Data” has become a priority. A number of software platforms have been developed to support the analysis of massive data sets using clusters of computers working in parallel. These platforms fall into two categories: those based on the relational data model and its SQL query language, and those with more flexible data models and query languages tailored to less rigidly structured data. The latter category is referred to here as Big Data Platforms. (SQL analytics on Big Data are covered separately.)

Historical Background

Today’s platforms for Big Data Analytics are the result of technical work carried out in two computer systems software fields: database systems and distributed systems.

Parallel Databases

In the field of database systems, the...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Data, data everywhere. The Economist; 25 Feb 2010.

    Google Scholar 

  2. Alexandrov A, Bergmann R, Ewen S, Freytag J-C, Hueske F, Heise A, Kao O, Leich M, Leser U, Markl V, Naumann F, Peters M, Rheinländer A, Sax M, Schelter S, Höger M, Tzoumas K, Warneke D. The stratosphere platform for big data analytics. VLDB J. 2014;(6):1–26.

    Google Scholar 

  3. Alsubaiee S, Altowim Y, Altwaijry H, Behm A, Borkar VR, Bu Y, Carey MJ, Cetindil I, Cheelangi M, Faraaz K, Gabrielova E, Grover R, Heilbron Z, Kim Y, Li C, Li G, Ok JM, Onose N, Pirzadeh P, Tsotras VJ, Vernica R, Wen J, Westmann T. Asterixdb: a scalable, open source BDMS. Proc VLDB Endow. 2014;7(14):1905–16.

    Article  Google Scholar 

  4. Borkar VR, Carey MJ, Grover R, Onose N, Vernica R. Hyracks: a flexible and extensible foundation for data-intensive computing. In: Abiteboul S, Böhm K, Koch C, Tan K-L, editors. Proceedings of the 27th International Conference on Data Engineering; 2011. p. 1151–62.

    Google Scholar 

  5. Dean J, Ghemawat S. Mapreduce: a flexible data processing tool. Commun ACM. 2010;53(1):72–77.

    Article  Google Scholar 

  6. DeWitt DJ, Gray J. Parallel database systems: the future of high performance database systems. Commun ACM. 1992;35(6):85–98.

    Article  Google Scholar 

  7. Ghemawat S, Gobioff H, Leung S. The Google File System. In: Scott ML, Peterson LL, editors. Proceedings of the 19th ACM Symposium on Operating Systems Principles; 2003. p. 29–43.

    Google Scholar 

  8. Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I. Graphx: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation; 2014.

    Google Scholar 

  9. Graefe G. Query evaluation techniques for large databases. ACM Comput Surv. 1993;25(2): 73–169.

    Article  Google Scholar 

  10. Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed data-parallel programs from sequential building blocks. In: Ferreira P, Gross TR, Veiga L, editors. Proceedings of the 2007 EuroSys Conference; 2007. p. 59–72.

    Google Scholar 

  11. Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T. Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow. 2010;3(1):330–39.

    Article  Google Scholar 

  12. Yu Y, Isard M, Fetterly D, Budiu M, Erlingsson Ú, Gunda PK, Currey J. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: Draves R, van Renesse R, editors. Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation; 2008. p. 1–14.

    Google Scholar 

  13. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Gribble SD, Katabi D, editors. Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation; 2012. p. 15–28.

    Google Scholar 

  14. Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I. Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the 24th ACM Symposium on Operating System Principles; 2013.

    Google Scholar 

  15. Zhou J, Bruno N, Wu M, Larson P, Chaiken R, Shakib D. SCOPE: parallel databases meet MapReduce. VLDB J. 2012;21(5):611–36.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Volker Markl .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Markl, V., Borkar, V., Zaharia, M., Westmann, T., Alexandrov, A. (2018). Big Data Platforms for Data Analytics. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80645

Download citation

Publish with us

Policies and ethics