Synonyms
Big data management systems; Data intensive computing software; Predictive analytics platforms
Definition
Due to the volume, velocity, and variety of data now coming from the Web, social media, and personal devices, the analysis of “Big Data” has become a priority. A number of software platforms have been developed to support the analysis of massive data sets using clusters of computers working in parallel. These platforms fall into two categories: those based on the relational data model and its SQL query language, and those with more flexible data models and query languages tailored to less rigidly structured data. The latter category is referred to here as Big Data Platforms. (SQL analytics on Big Data are covered separately.)
Historical Background
Today’s platforms for Big Data Analytics are the result of technical work carried out in two computer systems software fields: database systems and distributed systems.
Parallel Databases
In the field of database systems, the...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Data, data everywhere. The Economist; 25 Feb 2010.
Alexandrov A, Bergmann R, Ewen S, Freytag J-C, Hueske F, Heise A, Kao O, Leich M, Leser U, Markl V, Naumann F, Peters M, Rheinländer A, Sax M, Schelter S, Höger M, Tzoumas K, Warneke D. The stratosphere platform for big data analytics. VLDB J. 2014;(6):1–26.
Alsubaiee S, Altowim Y, Altwaijry H, Behm A, Borkar VR, Bu Y, Carey MJ, Cetindil I, Cheelangi M, Faraaz K, Gabrielova E, Grover R, Heilbron Z, Kim Y, Li C, Li G, Ok JM, Onose N, Pirzadeh P, Tsotras VJ, Vernica R, Wen J, Westmann T. Asterixdb: a scalable, open source BDMS. Proc VLDB Endow. 2014;7(14):1905–16.
Borkar VR, Carey MJ, Grover R, Onose N, Vernica R. Hyracks: a flexible and extensible foundation for data-intensive computing. In: Abiteboul S, Böhm K, Koch C, Tan K-L, editors. Proceedings of the 27th International Conference on Data Engineering; 2011. p. 1151–62.
Dean J, Ghemawat S. Mapreduce: a flexible data processing tool. Commun ACM. 2010;53(1):72–77.
DeWitt DJ, Gray J. Parallel database systems: the future of high performance database systems. Commun ACM. 1992;35(6):85–98.
Ghemawat S, Gobioff H, Leung S. The Google File System. In: Scott ML, Peterson LL, editors. Proceedings of the 19th ACM Symposium on Operating Systems Principles; 2003. p. 29–43.
Gonzalez JE, Xin RS, Dave A, Crankshaw D, Franklin MJ, Stoica I. Graphx: graph processing in a distributed dataflow framework. In: Proceedings of the 11th USENIX Symposium on Operating System Design and Implementation; 2014.
Graefe G. Query evaluation techniques for large databases. ACM Comput Surv. 1993;25(2): 73–169.
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed data-parallel programs from sequential building blocks. In: Ferreira P, Gross TR, Veiga L, editors. Proceedings of the 2007 EuroSys Conference; 2007. p. 59–72.
Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T. Dremel: interactive analysis of web-scale datasets. Proc VLDB Endow. 2010;3(1):330–39.
Yu Y, Isard M, Fetterly D, Budiu M, Erlingsson Ú, Gunda PK, Currey J. Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: Draves R, van Renesse R, editors. Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation; 2008. p. 1–14.
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauly M, Franklin MJ, Shenker S, Stoica I. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Gribble SD, Katabi D, editors. Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation; 2012. p. 15–28.
Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica I. Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of the 24th ACM Symposium on Operating System Principles; 2013.
Zhou J, Bruno N, Wu M, Larson P, Chaiken R, Shakib D. SCOPE: parallel databases meet MapReduce. VLDB J. 2012;21(5):611–36.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Markl, V., Borkar, V., Zaharia, M., Westmann, T., Alexandrov, A. (2018). Big Data Platforms for Data Analytics. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80645
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80645
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering