Abstract
The amount of data collected by modern industrial, government, and academic organizations has been increasing exponentially and will continue to grow at an accelerating rate for the foreseeable future. At companies across all industries, servers are overflowing with usage logs, message streams, transaction records, sensor data, business operations records, and mobile device data. Effectively analyzing these massive collections of data (“big data”) can create significant value for the world economy by enhancing productivity, increasing efficiency, and delivering more value to consumers. The need to convert raw data into useful information has led to the development of advanced and unique data storage, management, analysis, and visualization technologies, especially over the last decade. This monograph is an attempt to cover the design principles and core features of systems for analyzing very large datasets for business purposes. In particular, we organize systems into four main categories based on major and distinctive technological innovations. Parallel databases dating back to 1980s have added techniques like columnar data storage and processing, while new distributed platforms like MapReduce have been developed. Other innovations aimed at creating alternative system architectures for more generalized dataflow applications. Finally, the growing demand for interactive analytics has led to the emergence of a new class of systems that combine analytical and transactional capabilities.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abadi, Daniel J., Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A new model and architecture for data stream management. The VLDB Journal—The International Journal on Very Large Data Bases 12(2): 120–139.
Abadi, Daniel J., Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, et al. 2005. The design of the borealis stream processing engine. CIDR 5: 277–289.
Abadi, Daniel J., Daniel S. Myers, David J. DeWitt, and Samuel R. Madden. 2007. Materialization strategies in a column-oriented DBMS. In Data Engineering, IEEE 23rd International Conference on, 466–475.
Abadi, Daniel J., Peter A. Boncz, and Stavros Harizopoulos. 2009. Column-oriented database systems. Proceedings of the VLDB Endowment 2(2): 1664–1665.
Abouzeid, Azza, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. 2009. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment 2(1): 922–933.
Agrawal, Sanjay, Vivek Narasayya, and Beverly Yang. 2004. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 359–370.
Ailamaki, Anastassia, David J. DeWitt, Mark D. Hill, and Marios Skounakis. 2001. Weaving relations for cache performance. VLDB 1: 169–180.
Alexandrov, Alexander, Max Heimel, Volker Markl, Dominic Battré, Fabian Hueske, Erik Nijkamp, Stephan Ewen, Odej Kao, and Daniel Warneke. 2010. Massively parallel data analysis with PACTs on nephele. Proceedings of the VLDB Endowment 3(1–2): 1625–1628.
Alexandrov, Alexander, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, et al. 2014. The stratosphere platform for big data analytics. The VLDB Journal—The International Journal on Very Large Data Bases 23(6): 939–964.
Amazon. 2013. Amazon simple storage service (S3). Accessed 2013. http://aws.amazon.com/s3/
Babu, Shivnath, and Jennifer Widom. 2001. Continuous queries over data streams. ACM SIGMOD Record 30(3): 109–120.
Bajda-Pawlikowski, Kamil, Daniel J. Abadi, Avi Silberschatz, and Erik Paulson. 2011. Efficient processing of data warehousing queries in a split execution environment. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 1165–1176.
Baker, Jason, Chris Bond, James C. Corbett, J.J. Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. CIDR 11: 223–234.
Baru, Chaitanya K., Gilles Fecteau, Ambuj Goyal, H. Hsiao, Anant Jhingran, Sriram Padmanabhan, George P. Copeland, and Walter G. Wilson. 1995. DB2 parallel edition. IBM Systems Journal 34(2): 292–322.
Battré, Dominic, Stephan Ewen, Fabian Hueske, Odej Kao, Volker Markl, and Daniel Warneke. 2010. Nephele/PACTs: A programming model and execution framework for web-scale analytical processing. In Proceedings of the 1st ACM Symposium on Cloud Computing, 119–130.
Behm, Alexander, Vinayak R. Borkar, Michael J. Carey, Raman Grover, Chen Li, Nicola Onose, Rares Vernica, Alin Deutsch, Yannis Papakonstantinou, and Vassilis J. Tsotras. 2011. Asterix: Towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases 29(3): 185–216.
Biem, Alain, Eric Bouillet, Hanhua Feng, Anand Ranganathan, Anton Riabov, Olivier Verscheure, Haris Koutsopoulos, and Carlos Moran. 2010. IBM infosphere streams for scalable, real-time, intelligent transportation services. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 1093–1104.
Boncz, Peter A., Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-pipelining query execution. CIDR 5: 225–237.
Boncz, Peter, Torsten Grust, Maurice Van Keulen, Stefan Manegold, Jan Rittinger, and Jens Teubner. 2006. MonetDB/XQuery: A fast XQuery processor powered by a relational engine. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, 479–490.
Borkar, Vinayak, Michael Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In 2011 IEEE 27th International Conference on Data Engineering (ICDE), 1151–1162.
Borthakur, Dhruba, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, et al. 2011. Apache hadoop goes realtime at Facebook. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 1071–1080.
Bu, Yingyi, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. 2010. HaLoop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment 3(1–2): 285–296.
Buffers, Protocol. 2012. Developer guide. Accessed 2012.
Calder, Brad, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, et al. 2011. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, 143–157.
Cascading. 2011. Cascading: Application platform for enterprise big data. http://www.cascading.org/
Chambers, Craig, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. ACM SIGPLAN Notices 45(6): 363–375.
Chandramouli, Badrish, Jonathan Goldstein, and Songyun Duan. 2012. Temporal analytics on big data for web advertising. In 2012 IEEE 28th International Conference on Data Engineering (ICDE), 90–101.
Chang, Fay, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2): 4.
Chen, Songting. 2010. Cheetah: A high performance, custom data warehouse on top of MapReduce. Proceedings of the VLDB Endowment 3(1–2): 1459–1468.
Chen, Hsinchun, Roger H.L. Chiang, and Veda C. Storey. 2012. Business intelligence and analytics: From big data to big impact. MIS Quarterly 36(4): 1165–1188.
Cohen, Jeffrey, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, and Caleb Welton. 2009. MAD skills: New analysis practices for big data. Proceedings of the VLDB Endowment 2(2): 1481–1492.
Condie, Tyson, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. NSDI 10(4): 20.
Corbet, J.C., J. Dean, and M. Epstein. 2012. Spanner: Google’s globally distributed database. In Proceedings of the 10th USENIX conference on operation systems design and implementation, 251–264. Berkeley, CA: USENIX Association.
Croft, W., Donald Metzler Bruce, and Trevor Strohman. 2010. Search engines: Information retrieval in practice. Reading: Addison-Wesley.
Dean, J., and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters (2004). Gottfrid, D.: Google, Inc.
Dean, Jeffrey, and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1): 107–113.
DeWitt, David, and Jim Gray. 1992. Parallel database systems: The future of high performance database systems. Communications of the ACM 35(6): 85–98.
DeWitt, David J., Shahram Ghandeharizadeh, Donovan Schneider, Allan Bricker, Hui-I. Hsiao, and Rick Rasmussen. 1990. The Gamma database machine project. IEEE Transactions on Knowledge and Data Engineering 2(1): 44–62.
DeWitt, David J., Jeffrey F. Naughton, Donovan A. Schneider, and Srinivasan Seshadri. 1992. Practical skew handling in parallel joins. Madison: University of Wisconsin-Madison, Computer Sciences Department.
Dittrich, Jens, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment 3(1–2): 515–529.
Dittrich, Jens, Jorge-Arnulfo Quiané-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, and Jörg Schad. 2012. Only aggressive elephants are fast elephants. Proceedings of the VLDB Endowment 5(11): 1591–1602.
Ekanayake, Jaliya, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 810–818.
Eltabakh, Mohamed Y., Yuanyuan Tian, Fatma Özcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson. 2011. CoHadoop: Flexible data placement and its exploitation in Hadoop. Proceedings of the VLDB Endowment 4(9): 575–585.
Färber, Franz, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2012a. SAP HANA database: Data management for modern business applications. ACM SIGMOD Record 40(4): 45–51.
Färber, Franz, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012b. The SAP HANA database—An architecture overview. IEEE Data Engineering Bulletin 35(1): 28–33.
Floratou, Avrilia, Jignesh M. Patel, Eugene J. Shekita, and Sandeep Tata. 2011. Column-oriented storage techniques for MapReduce. Proceedings of the VLDB Endowment 4(7): 419–429.
Frankel, Felice, and Rosalind Reid. 2008. Big data: Distilling meaning from data. Nature 455(7209): 30–30.
Franklin, Michael J., Sailesh Krishnamurthy, Neil Conway, Alan Li, Alex Russakovsky, and Neil Thombre. 2009. Continuous analytics: Rethinking query processing in a network-effect world. In CIDR.
George, Lars. 2011. HBase: The definitive guide. USA: O’Reilly Media, Inc.
Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. 2003. The google file system. ACM SIGOPS Operating Systems Review 37(5): 29–43.
Greenplum. 2013. Pivotal greenplum database. Accessed 2013. http://www.pivotal.io/big-data/pivotal-greenplum-database
Grund, Martin, Philippe Cudré-Mauroux, Jens Krüger, Samuel Madden, and Hasso Plattner. 2012. An overview of HYRISE-a main memory hybrid storage engine. IEEE Data Engineering Bulletin 35(1): 52–57.
Hausenblas, Michael, and Jacques Nadeau. 2013. Apache drill: Interactive ad-hoc analysis at scale. Big Data 1(2): 100–104.
He, Yongqiang, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, and Zhiwei Xu. 2011. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In 2011 IEEE 27th International Conference on Data Engineering (ICDE), 1199–1208.
Herodotou, Herodotos, Nedyalko Borisov, and Shivnath Babu. 2011. Query optimization techniques for partitioned tables. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 49–60.
Hoffman, Steve. 2015. Apache flume: Distributed log collection for hadoop. Birmingham: Packt Publishing.
Hsiao, Hui-I, and David J. DeWitt. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. Madison: University of Wisconsin-Madison, Computer Sciences Department.
IBM Corporation. 2007. IBM knowledge center: Partitioned tables. Accessed 2007. http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0021560.html
IBM Netezza. 2012. IBM Netezza data warehouse appliances. Accessed 2012. http://www-01.ibm.com/software/data/netezza/
Idreos, Stratos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. 2012. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Engineering Bulletin 35(1): 40–45.
Infobright. 2013. Infobright—Analytic database for the internet of things. Accessed 2013. http://www.infobright.com/
Isard, Michael, and Yuan Yu. 2009. Distributed data-parallel computing using a high-level programming language. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, 987–994.
Isard, Michael, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review 41(3): 59–72.
Islam, Mohammad, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, Andreas Neumann, and Alejandro Abdelnur. 2012. Oozie: Towards a scalable workflow management system for hadoop. In Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 4.
Kemper, Alfons, Thomas Neumann, Florian Funke, Viktor Leis, and Henrik Mühe. 2012. HyPer: Adapting columnar main-memory data management for transactional and query processing. IEEE Data Engineering Bulletin 35(1): 46–51.
KFS. 2013. Kosmos distributed file system. Accessed 2013. http://code.google.com/p/kosmosfs/
Lakshman, Avinash, and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review 44(2): 35–40.
Lam, Wang, Lu Liu, S. T. S. Prasad, Anand Rajaraman, Zoheb Vacheri, AnHai Doan. 2012. Muppet: MapReduce-style processing of fast data. Proceedings of the VLDB Endowment 5(12): 1814–1825.
Lamb, Andrew, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The vertica analytic database: C-store 7 years later. Proceedings of the VLDB Endowment 5(12): 1790–1801.
Laney, Doug. 2001. 3D data management: Controlling data volume, velocity and variety. META Group Research Note 6: 70.
Lee, Rubao, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Xiaodong Zhang. 2011. Ysmart: Yet another SQL-to-MapReduce translator. In 2011 31st International Conference on Distributed Computing Systems (ICDCS), 25–36.
Lee, George, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy. 2012. The unified logging infrastructure for data analytics at Twitter. Proceedings of the VLDB Endowment 5(12): 1771–1780.
Lin, Yuting, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, and Sai Wu. 2011. Llama: Leveraging columnar storage for scalable join processing in the mapreduce framework. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 961–972.
Low, Yucheng, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5(8): 716–727.
MacNicol, Roger, and Blaine French. 2004. Sybase IQ multiplex-designed for analytics. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, 1227–1230. Seoul: VLDB Endowment.
Malewicz, Grzegorz, Matthew H. Austern, Aart JC Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 135–146.
MapR. 2013. MapR file system. Accessed 2013. http://www.mapr.com/products/apache-hadoop
Mehta, Manish, and David J. DeWitt. 1997. Data placement in shared-nothing parallel database systems. The VLDB Journal—The International Journal on Very Large Data Bases 6(1): 53–72.
Meijer, Erik, Brian Beckman, and Gavin Bierman. 2006. Linq: Reconciling object, relations and xml in the .net framework. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, 706–706.
Melnik, Sergey, Andrey Gubarev, Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1–2): 330–339.
Morales, Tony. 2007. Oracle database VLDB and partitioning guide 11 g release 1 (11.1). Oracle, July 2007.
Neumeyer, Leonardo, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed stream computing platform. In 2010 IEEE International Conference on Data Mining Workshops (ICDMW), 170–177.
Nykiel, Tomasz, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proceedings of the VLDB Endowment 3(1–2): 494–505.
Olston, Christopher, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: A not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 1099–1110.
Ovsiannikov, Michael, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. 2013. The quantcast file system. Proceedings of the VLDB Endowment 6(11): 1092–1101.
ParAccel. 2013. ParAccel analytic platform. Accessed 2013. http://www.paraccel.com/
Rabkin, Ariel, and Randy H. Katz. 2010. Chukwa: A system for reliable large-scale log collection. LISA 10: 1–15.
Rao, Jun, Chun Zhang, Nimrod Megiddo, and Guy Lohman. 2002. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 558–569.
Shvachko, Konstantin, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 1–10.
Stonebraker, Mike, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, et al. 2005. C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, 553–564. Seoul: VLDB Endowment.
Storm, Apache. 2013. Storm, distributed and fault-tolerant real-time computation.
Sumbaly, Roshan, Jay Kreps, and Sam Shah. 2013. The big data ecosystem at linkedin. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 1125–1134.
Talmage, Ron. 2009. Partitioned table and index strategies using SQL server 2008. MSDN Library, March 2009.
Teradata. 2012. Teradata enterprise data warehouse. Accessed 2012. http://www.teradata.com
Thusoo, Ashish, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2(2): 1626–1629.
Thusoo, Ashish, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu. 2010. Data warehousing and analytics infrastructure at Facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 1013–1020.
Traverso, Martin. 2013. Presto: Interacting with petabytes of data at Facebook. Retrieved February 4, 2014.
Wanderman-Milne, Skye, and Li Nong. 2014. Runtime code generation in cloudera impala. IEEE Data Eng. Bull. 37(1): 31–37.
Weil, Sage A., Scott A. Brandt, Ethan L. Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, 307–320. Berkeley, CA: USENIX Association.
White, Tom. 2010. Hadoop: The definitive guide. Sunnyvale, CA: Yahoo.
Wu, Lili, Roshan Sumbaly, Chris Riccomini, Gordon Koo, Hyung Jin Kim, Jay Kreps, and Sam Shah. 2012. Avatara: Olap for web-scale analytics products. Proceedings of the VLDB Endowment 5(12): 1874–1877.
Xin, Reynold S., Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013a. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, 2.
Xin, Reynold S., Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2013b. Shark: SQL and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 13–24.
Zaharia, Matei, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, 2–2. Berkeley, CA: USENIX Association.
Zhang, Yanfeng, Qixin Gao, Lixin Gao, and Cuirong Wang. 2011. Priter: A distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing, 13.
Zhou, Jingren, Nicolas Bruno, Ming-Chuan Wu, Per-Ake Larson, Ronnie Chaiken, and Darren Shakib. 2012. SCOPE: Parallel databases meet MapReduce. The VLDB Journal—The International Journal on Very Large Data Bases 21(5): 611–636.
Zukowski, Marcin, and Peter A. Boncz. 2012. Vectorwise: Beyond column stores. IEEE Data Engineering Bulletin 35(1): 21–27.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2017 The Author(s)
About this chapter
Cite this chapter
Herodotou, H. (2017). Business Intelligence and Analytics: Big Systems for Big Data. In: Carayannis, E., Sindakis, S. (eds) Analytics, Innovation, and Excellence-Driven Enterprise Sustainability. Palgrave Studies in Democracy, Innovation, and Entrepreneurship for Growth. Palgrave Macmillan, New York. https://doi.org/10.1057/978-1-137-37879-8_2
Download citation
DOI: https://doi.org/10.1057/978-1-137-37879-8_2
Published:
Publisher Name: Palgrave Macmillan, New York
Print ISBN: 978-1-137-39301-2
Online ISBN: 978-1-137-37879-8
eBook Packages: Business and ManagementBusiness and Management (R0)