Skip to main content

Business Intelligence and Analytics: Big Systems for Big Data

  • Chapter
  • First Online:

Abstract

The amount of data collected by modern industrial, government, and academic organizations has been increasing exponentially and will continue to grow at an accelerating rate for the foreseeable future. At companies across all industries, servers are overflowing with usage logs, message streams, transaction records, sensor data, business operations records, and mobile device data. Effectively analyzing these massive collections of data (“big data”) can create significant value for the world economy by enhancing productivity, increasing efficiency, and delivering more value to consumers. The need to convert raw data into useful information has led to the development of advanced and unique data storage, management, analysis, and visualization technologies, especially over the last decade. This monograph is an attempt to cover the design principles and core features of systems for analyzing very large datasets for business purposes. In particular, we organize systems into four main categories based on major and distinctive technological innovations. Parallel databases dating back to 1980s have added techniques like columnar data storage and processing, while new distributed platforms like MapReduce have been developed. Other innovations aimed at creating alternative system architectures for more generalized dataflow applications. Finally, the growing demand for interactive analytics has led to the emergence of a new class of systems that combine analytical and transactional capabilities.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Abadi, Daniel J., Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A new model and architecture for data stream management. The VLDB Journal—The International Journal on Very Large Data Bases 12(2): 120–139.

    Article  Google Scholar 

  • Abadi, Daniel J., Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, et al. 2005. The design of the borealis stream processing engine. CIDR 5: 277–289.

    Google Scholar 

  • Abadi, Daniel J., Daniel S. Myers, David J. DeWitt, and Samuel R. Madden. 2007. Materialization strategies in a column-oriented DBMS. In Data Engineering, IEEE 23rd International Conference on, 466–475.

    Google Scholar 

  • Abadi, Daniel J., Peter A. Boncz, and Stavros Harizopoulos. 2009. Column-oriented database systems. Proceedings of the VLDB Endowment 2(2): 1664–1665.

    Article  Google Scholar 

  • Abouzeid, Azza, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. 2009. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment 2(1): 922–933.

    Article  Google Scholar 

  • Agrawal, Sanjay, Vivek Narasayya, and Beverly Yang. 2004. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 359–370.

    Google Scholar 

  • Ailamaki, Anastassia, David J. DeWitt, Mark D. Hill, and Marios Skounakis. 2001. Weaving relations for cache performance. VLDB 1: 169–180.

    Google Scholar 

  • Alexandrov, Alexander, Max Heimel, Volker Markl, Dominic Battré, Fabian Hueske, Erik Nijkamp, Stephan Ewen, Odej Kao, and Daniel Warneke. 2010. Massively parallel data analysis with PACTs on nephele. Proceedings of the VLDB Endowment 3(1–2): 1625–1628.

    Article  Google Scholar 

  • Alexandrov, Alexander, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, et al. 2014. The stratosphere platform for big data analytics. The VLDB Journal—The International Journal on Very Large Data Bases 23(6): 939–964.

    Article  Google Scholar 

  • Amazon. 2013. Amazon simple storage service (S3). Accessed 2013. http://aws.amazon.com/s3/

  • Babu, Shivnath, and Jennifer Widom. 2001. Continuous queries over data streams. ACM SIGMOD Record 30(3): 109–120.

    Article  Google Scholar 

  • Bajda-Pawlikowski, Kamil, Daniel J. Abadi, Avi Silberschatz, and Erik Paulson. 2011. Efficient processing of data warehousing queries in a split execution environment. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 1165–1176.

    Google Scholar 

  • Baker, Jason, Chris Bond, James C. Corbett, J.J. Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. CIDR 11: 223–234.

    Google Scholar 

  • Baru, Chaitanya K., Gilles Fecteau, Ambuj Goyal, H. Hsiao, Anant Jhingran, Sriram Padmanabhan, George P. Copeland, and Walter G. Wilson. 1995. DB2 parallel edition. IBM Systems Journal 34(2): 292–322.

    Article  Google Scholar 

  • Battré, Dominic, Stephan Ewen, Fabian Hueske, Odej Kao, Volker Markl, and Daniel Warneke. 2010. Nephele/PACTs: A programming model and execution framework for web-scale analytical processing. In Proceedings of the 1st ACM Symposium on Cloud Computing, 119–130.

    Google Scholar 

  • Behm, Alexander, Vinayak R. Borkar, Michael J. Carey, Raman Grover, Chen Li, Nicola Onose, Rares Vernica, Alin Deutsch, Yannis Papakonstantinou, and Vassilis J. Tsotras. 2011. Asterix: Towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases 29(3): 185–216.

    Article  Google Scholar 

  • Biem, Alain, Eric Bouillet, Hanhua Feng, Anand Ranganathan, Anton Riabov, Olivier Verscheure, Haris Koutsopoulos, and Carlos Moran. 2010. IBM infosphere streams for scalable, real-time, intelligent transportation services. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 1093–1104.

    Google Scholar 

  • Boncz, Peter A., Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-pipelining query execution. CIDR 5: 225–237.

    Google Scholar 

  • Boncz, Peter, Torsten Grust, Maurice Van Keulen, Stefan Manegold, Jan Rittinger, and Jens Teubner. 2006. MonetDB/XQuery: A fast XQuery processor powered by a relational engine. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, 479–490.

    Google Scholar 

  • Borkar, Vinayak, Michael Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In 2011 IEEE 27th International Conference on Data Engineering (ICDE), 1151–1162.

    Google Scholar 

  • Borthakur, Dhruba, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, et al. 2011. Apache hadoop goes realtime at Facebook. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 1071–1080.

    Google Scholar 

  • Bu, Yingyi, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. 2010. HaLoop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment 3(1–2): 285–296.

    Article  Google Scholar 

  • Buffers, Protocol. 2012. Developer guide. Accessed 2012.

    Google Scholar 

  • Calder, Brad, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, et al. 2011. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, 143–157.

    Google Scholar 

  • Cascading. 2011. Cascading: Application platform for enterprise big data. http://www.cascading.org/

  • Chambers, Craig, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. ACM SIGPLAN Notices 45(6): 363–375.

    Article  Google Scholar 

  • Chandramouli, Badrish, Jonathan Goldstein, and Songyun Duan. 2012. Temporal analytics on big data for web advertising. In 2012 IEEE 28th International Conference on Data Engineering (ICDE), 90–101.

    Google Scholar 

  • Chang, Fay, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2): 4.

    Article  Google Scholar 

  • Chen, Songting. 2010. Cheetah: A high performance, custom data warehouse on top of MapReduce. Proceedings of the VLDB Endowment 3(1–2): 1459–1468.

    Article  Google Scholar 

  • Chen, Hsinchun, Roger H.L. Chiang, and Veda C. Storey. 2012. Business intelligence and analytics: From big data to big impact. MIS Quarterly 36(4): 1165–1188.

    Google Scholar 

  • Cohen, Jeffrey, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, and Caleb Welton. 2009. MAD skills: New analysis practices for big data. Proceedings of the VLDB Endowment 2(2): 1481–1492.

    Article  Google Scholar 

  • Condie, Tyson, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. NSDI 10(4): 20.

    Google Scholar 

  • Corbet, J.C., J. Dean, and M. Epstein. 2012. Spanner: Google’s globally distributed database. In Proceedings of the 10th USENIX conference on operation systems design and implementation, 251–264. Berkeley, CA: USENIX Association.

    Google Scholar 

  • Croft, W., Donald Metzler Bruce, and Trevor Strohman. 2010. Search engines: Information retrieval in practice. Reading: Addison-Wesley.

    Google Scholar 

  • Dean, J., and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters (2004). Gottfrid, D.: Google, Inc.

    Google Scholar 

  • Dean, Jeffrey, and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1): 107–113.

    Article  Google Scholar 

  • DeWitt, David, and Jim Gray. 1992. Parallel database systems: The future of high performance database systems. Communications of the ACM 35(6): 85–98.

    Article  Google Scholar 

  • DeWitt, David J., Shahram Ghandeharizadeh, Donovan Schneider, Allan Bricker, Hui-I. Hsiao, and Rick Rasmussen. 1990. The Gamma database machine project. IEEE Transactions on Knowledge and Data Engineering 2(1): 44–62.

    Article  Google Scholar 

  • DeWitt, David J., Jeffrey F. Naughton, Donovan A. Schneider, and Srinivasan Seshadri. 1992. Practical skew handling in parallel joins. Madison: University of Wisconsin-Madison, Computer Sciences Department.

    Google Scholar 

  • Dittrich, Jens, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment 3(1–2): 515–529.

    Article  Google Scholar 

  • Dittrich, Jens, Jorge-Arnulfo Quiané-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, and Jörg Schad. 2012. Only aggressive elephants are fast elephants. Proceedings of the VLDB Endowment 5(11): 1591–1602.

    Article  Google Scholar 

  • Ekanayake, Jaliya, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 810–818.

    Google Scholar 

  • Eltabakh, Mohamed Y., Yuanyuan Tian, Fatma Özcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson. 2011. CoHadoop: Flexible data placement and its exploitation in Hadoop. Proceedings of the VLDB Endowment 4(9): 575–585.

    Article  Google Scholar 

  • Färber, Franz, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2012a. SAP HANA database: Data management for modern business applications. ACM SIGMOD Record 40(4): 45–51.

    Article  Google Scholar 

  • Färber, Franz, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012b. The SAP HANA database—An architecture overview. IEEE Data Engineering Bulletin 35(1): 28–33.

    Google Scholar 

  • Floratou, Avrilia, Jignesh M. Patel, Eugene J. Shekita, and Sandeep Tata. 2011. Column-oriented storage techniques for MapReduce. Proceedings of the VLDB Endowment 4(7): 419–429.

    Article  Google Scholar 

  • Frankel, Felice, and Rosalind Reid. 2008. Big data: Distilling meaning from data. Nature 455(7209): 30–30.

    Article  Google Scholar 

  • Franklin, Michael J., Sailesh Krishnamurthy, Neil Conway, Alan Li, Alex Russakovsky, and Neil Thombre. 2009. Continuous analytics: Rethinking query processing in a network-effect world. In CIDR.

    Google Scholar 

  • George, Lars. 2011. HBase: The definitive guide. USA: O’Reilly Media, Inc.

    Google Scholar 

  • Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. 2003. The google file system. ACM SIGOPS Operating Systems Review 37(5): 29–43.

    Article  Google Scholar 

  • Greenplum. 2013. Pivotal greenplum database. Accessed 2013. http://www.pivotal.io/big-data/pivotal-greenplum-database

  • Grund, Martin, Philippe Cudré-Mauroux, Jens Krüger, Samuel Madden, and Hasso Plattner. 2012. An overview of HYRISE-a main memory hybrid storage engine. IEEE Data Engineering Bulletin 35(1): 52–57.

    Google Scholar 

  • Hausenblas, Michael, and Jacques Nadeau. 2013. Apache drill: Interactive ad-hoc analysis at scale. Big Data 1(2): 100–104.

    Article  Google Scholar 

  • He, Yongqiang, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, and Zhiwei Xu. 2011. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In 2011 IEEE 27th International Conference on Data Engineering (ICDE), 1199–1208.

    Google Scholar 

  • Herodotou, Herodotos, Nedyalko Borisov, and Shivnath Babu. 2011. Query optimization techniques for partitioned tables. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 49–60.

    Google Scholar 

  • Hoffman, Steve. 2015. Apache flume: Distributed log collection for hadoop. Birmingham: Packt Publishing.

    Google Scholar 

  • Hsiao, Hui-I, and David J. DeWitt. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. Madison: University of Wisconsin-Madison, Computer Sciences Department.

    Google Scholar 

  • IBM Corporation. 2007. IBM knowledge center: Partitioned tables. Accessed 2007. http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0021560.html

  • IBM Netezza. 2012. IBM Netezza data warehouse appliances. Accessed 2012. http://www-01.ibm.com/software/data/netezza/

  • Idreos, Stratos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. 2012. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Engineering Bulletin 35(1): 40–45.

    Google Scholar 

  • Infobright. 2013. Infobright—Analytic database for the internet of things. Accessed 2013. http://www.infobright.com/

  • Isard, Michael, and Yuan Yu. 2009. Distributed data-parallel computing using a high-level programming language. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, 987–994.

    Google Scholar 

  • Isard, Michael, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review 41(3): 59–72.

    Article  Google Scholar 

  • Islam, Mohammad, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, Andreas Neumann, and Alejandro Abdelnur. 2012. Oozie: Towards a scalable workflow management system for hadoop. In Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 4.

    Google Scholar 

  • Kemper, Alfons, Thomas Neumann, Florian Funke, Viktor Leis, and Henrik Mühe. 2012. HyPer: Adapting columnar main-memory data management for transactional and query processing. IEEE Data Engineering Bulletin 35(1): 46–51.

    Google Scholar 

  • KFS. 2013. Kosmos distributed file system. Accessed 2013. http://code.google.com/p/kosmosfs/

  • Lakshman, Avinash, and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review 44(2): 35–40.

    Article  Google Scholar 

  • Lam, Wang, Lu Liu, S. T. S. Prasad, Anand Rajaraman, Zoheb Vacheri, AnHai Doan. 2012. Muppet: MapReduce-style processing of fast data. Proceedings of the VLDB Endowment 5(12): 1814–1825.

    Google Scholar 

  • Lamb, Andrew, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The vertica analytic database: C-store 7 years later. Proceedings of the VLDB Endowment 5(12): 1790–1801.

    Article  Google Scholar 

  • Laney, Doug. 2001. 3D data management: Controlling data volume, velocity and variety. META Group Research Note 6: 70.

    Google Scholar 

  • Lee, Rubao, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Xiaodong Zhang. 2011. Ysmart: Yet another SQL-to-MapReduce translator. In 2011 31st International Conference on Distributed Computing Systems (ICDCS), 25–36.

    Google Scholar 

  • Lee, George, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy. 2012. The unified logging infrastructure for data analytics at Twitter. Proceedings of the VLDB Endowment 5(12): 1771–1780.

    Article  Google Scholar 

  • Lin, Yuting, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, and Sai Wu. 2011. Llama: Leveraging columnar storage for scalable join processing in the mapreduce framework. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 961–972.

    Google Scholar 

  • Low, Yucheng, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5(8): 716–727.

    Article  Google Scholar 

  • MacNicol, Roger, and Blaine French. 2004. Sybase IQ multiplex-designed for analytics. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, 1227–1230. Seoul: VLDB Endowment.

    Google Scholar 

  • Malewicz, Grzegorz, Matthew H. Austern, Aart JC Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 135–146.

    Google Scholar 

  • MapR. 2013. MapR file system. Accessed 2013. http://www.mapr.com/products/apache-hadoop

  • Mehta, Manish, and David J. DeWitt. 1997. Data placement in shared-nothing parallel database systems. The VLDB Journal—The International Journal on Very Large Data Bases 6(1): 53–72.

    Article  Google Scholar 

  • Meijer, Erik, Brian Beckman, and Gavin Bierman. 2006. Linq: Reconciling object, relations and xml in the .net framework. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, 706–706.

    Google Scholar 

  • Melnik, Sergey, Andrey Gubarev, Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1–2): 330–339.

    Article  Google Scholar 

  • Morales, Tony. 2007. Oracle database VLDB and partitioning guide 11 g release 1 (11.1). Oracle, July 2007.

    Google Scholar 

  • Neumeyer, Leonardo, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed stream computing platform. In 2010 IEEE International Conference on Data Mining Workshops (ICDMW), 170–177.

    Google Scholar 

  • Nykiel, Tomasz, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proceedings of the VLDB Endowment 3(1–2): 494–505.

    Article  Google Scholar 

  • Olston, Christopher, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: A not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 1099–1110.

    Google Scholar 

  • Ovsiannikov, Michael, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. 2013. The quantcast file system. Proceedings of the VLDB Endowment 6(11): 1092–1101.

    Article  Google Scholar 

  • ParAccel. 2013. ParAccel analytic platform. Accessed 2013. http://www.paraccel.com/

  • Rabkin, Ariel, and Randy H. Katz. 2010. Chukwa: A system for reliable large-scale log collection. LISA 10: 1–15.

    Google Scholar 

  • Rao, Jun, Chun Zhang, Nimrod Megiddo, and Guy Lohman. 2002. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 558–569.

    Google Scholar 

  • Shvachko, Konstantin, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 1–10.

    Google Scholar 

  • Stonebraker, Mike, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, et al. 2005. C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, 553–564. Seoul: VLDB Endowment.

    Google Scholar 

  • Storm, Apache. 2013. Storm, distributed and fault-tolerant real-time computation.

    Google Scholar 

  • Sumbaly, Roshan, Jay Kreps, and Sam Shah. 2013. The big data ecosystem at linkedin. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 1125–1134.

    Google Scholar 

  • Talmage, Ron. 2009. Partitioned table and index strategies using SQL server 2008. MSDN Library, March 2009.

    Google Scholar 

  • Teradata. 2012. Teradata enterprise data warehouse. Accessed 2012. http://www.teradata.com

  • Thusoo, Ashish, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2(2): 1626–1629.

    Article  Google Scholar 

  • Thusoo, Ashish, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu. 2010. Data warehousing and analytics infrastructure at Facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 1013–1020.

    Google Scholar 

  • Traverso, Martin. 2013. Presto: Interacting with petabytes of data at Facebook. Retrieved February 4, 2014.

    Google Scholar 

  • Wanderman-Milne, Skye, and Li Nong. 2014. Runtime code generation in cloudera impala. IEEE Data Eng. Bull. 37(1): 31–37.

    Google Scholar 

  • Weil, Sage A., Scott A. Brandt, Ethan L. Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, 307–320. Berkeley, CA: USENIX Association.

    Google Scholar 

  • White, Tom. 2010. Hadoop: The definitive guide. Sunnyvale, CA: Yahoo.

    Google Scholar 

  • Wu, Lili, Roshan Sumbaly, Chris Riccomini, Gordon Koo, Hyung Jin Kim, Jay Kreps, and Sam Shah. 2012. Avatara: Olap for web-scale analytics products. Proceedings of the VLDB Endowment 5(12): 1874–1877.

    Article  Google Scholar 

  • Xin, Reynold S., Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013a. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, 2.

    Google Scholar 

  • Xin, Reynold S., Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2013b. Shark: SQL and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 13–24.

    Google Scholar 

  • Zaharia, Matei, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, 2–2. Berkeley, CA: USENIX Association.

    Google Scholar 

  • Zhang, Yanfeng, Qixin Gao, Lixin Gao, and Cuirong Wang. 2011. Priter: A distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing, 13.

    Google Scholar 

  • Zhou, Jingren, Nicolas Bruno, Ming-Chuan Wu, Per-Ake Larson, Ronnie Chaiken, and Darren Shakib. 2012. SCOPE: Parallel databases meet MapReduce. The VLDB Journal—The International Journal on Very Large Data Bases 21(5): 611–636.

    Article  Google Scholar 

  • Zukowski, Marcin, and Peter A. Boncz. 2012. Vectorwise: Beyond column stores. IEEE Data Engineering Bulletin 35(1): 21–27.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Herodotos Herodotou .

Editor information

Editors and Affiliations

Copyright information

© 2017 The Author(s)

About this chapter

Cite this chapter

Herodotou, H. (2017). Business Intelligence and Analytics: Big Systems for Big Data. In: Carayannis, E., Sindakis, S. (eds) Analytics, Innovation, and Excellence-Driven Enterprise Sustainability. Palgrave Studies in Democracy, Innovation, and Entrepreneurship for Growth. Palgrave Macmillan, New York. https://doi.org/10.1057/978-1-137-37879-8_2

Download citation

Publish with us

Policies and ethics