Business Intelligence and Analytics: Big Systems for Big Data

Herodotou, Herodotos

doi:10.1057/978-1-137-37879-8_2

Business Intelligence and Analytics: Big Systems for Big Data

Herodotos Herodotou⁴

Chapter
First Online: 20 April 2017

1621 Accesses
1 Citations

Part of the book series: Palgrave Studies in Democracy, Innovation, and Entrepreneurship for Growth ((DIG))

Abstract

The amount of data collected by modern industrial, government, and academic organizations has been increasing exponentially and will continue to grow at an accelerating rate for the foreseeable future. At companies across all industries, servers are overflowing with usage logs, message streams, transaction records, sensor data, business operations records, and mobile device data. Effectively analyzing these massive collections of data (“big data”) can create significant value for the world economy by enhancing productivity, increasing efficiency, and delivering more value to consumers. The need to convert raw data into useful information has led to the development of advanced and unique data storage, management, analysis, and visualization technologies, especially over the last decade. This monograph is an attempt to cover the design principles and core features of systems for analyzing very large datasets for business purposes. In particular, we organize systems into four main categories based on major and distinctive technological innovations. Parallel databases dating back to 1980s have added techniques like columnar data storage and processing, while new distributed platforms like MapReduce have been developed. Other innovations aimed at creating alternative system architectures for more generalized dataflow applications. Finally, the growing demand for interactive analytics has led to the emergence of a new class of systems that combine analytical and transactional capabilities.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abadi, Daniel J., Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A new model and architecture for data stream management. The VLDB Journal—The International Journal on Very Large Data Bases 12(2): 120–139.
Article Google Scholar
Abadi, Daniel J., Yanif Ahmad, Magdalena Balazinska, Ugur Cetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, et al. 2005. The design of the borealis stream processing engine. CIDR 5: 277–289.
Google Scholar
Abadi, Daniel J., Daniel S. Myers, David J. DeWitt, and Samuel R. Madden. 2007. Materialization strategies in a column-oriented DBMS. In Data Engineering, IEEE 23rd International Conference on, 466–475.
Google Scholar
Abadi, Daniel J., Peter A. Boncz, and Stavros Harizopoulos. 2009. Column-oriented database systems. Proceedings of the VLDB Endowment 2(2): 1664–1665.
Article Google Scholar
Abouzeid, Azza, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. 2009. HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceedings of the VLDB Endowment 2(1): 922–933.
Article Google Scholar
Agrawal, Sanjay, Vivek Narasayya, and Beverly Yang. 2004. Integrating vertical and horizontal partitioning into automated physical database design. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, 359–370.
Google Scholar
Ailamaki, Anastassia, David J. DeWitt, Mark D. Hill, and Marios Skounakis. 2001. Weaving relations for cache performance. VLDB 1: 169–180.
Google Scholar
Alexandrov, Alexander, Max Heimel, Volker Markl, Dominic Battré, Fabian Hueske, Erik Nijkamp, Stephan Ewen, Odej Kao, and Daniel Warneke. 2010. Massively parallel data analysis with PACTs on nephele. Proceedings of the VLDB Endowment 3(1–2): 1625–1628.
Article Google Scholar
Alexandrov, Alexander, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, et al. 2014. The stratosphere platform for big data analytics. The VLDB Journal—The International Journal on Very Large Data Bases 23(6): 939–964.
Article Google Scholar
Amazon. 2013. Amazon simple storage service (S3). Accessed 2013. http://aws.amazon.com/s3/
Babu, Shivnath, and Jennifer Widom. 2001. Continuous queries over data streams. ACM SIGMOD Record 30(3): 109–120.
Article Google Scholar
Bajda-Pawlikowski, Kamil, Daniel J. Abadi, Avi Silberschatz, and Erik Paulson. 2011. Efficient processing of data warehousing queries in a split execution environment. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 1165–1176.
Google Scholar
Baker, Jason, Chris Bond, James C. Corbett, J.J. Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. 2011. Megastore: Providing scalable, highly available storage for interactive services. CIDR 11: 223–234.
Google Scholar
Baru, Chaitanya K., Gilles Fecteau, Ambuj Goyal, H. Hsiao, Anant Jhingran, Sriram Padmanabhan, George P. Copeland, and Walter G. Wilson. 1995. DB2 parallel edition. IBM Systems Journal 34(2): 292–322.
Article Google Scholar
Battré, Dominic, Stephan Ewen, Fabian Hueske, Odej Kao, Volker Markl, and Daniel Warneke. 2010. Nephele/PACTs: A programming model and execution framework for web-scale analytical processing. In Proceedings of the 1st ACM Symposium on Cloud Computing, 119–130.
Google Scholar
Behm, Alexander, Vinayak R. Borkar, Michael J. Carey, Raman Grover, Chen Li, Nicola Onose, Rares Vernica, Alin Deutsch, Yannis Papakonstantinou, and Vassilis J. Tsotras. 2011. Asterix: Towards a scalable, semistructured data platform for evolving-world models. Distributed and Parallel Databases 29(3): 185–216.
Article Google Scholar
Biem, Alain, Eric Bouillet, Hanhua Feng, Anand Ranganathan, Anton Riabov, Olivier Verscheure, Haris Koutsopoulos, and Carlos Moran. 2010. IBM infosphere streams for scalable, real-time, intelligent transportation services. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 1093–1104.
Google Scholar
Boncz, Peter A., Marcin Zukowski, and Niels Nes. 2005. MonetDB/X100: Hyper-pipelining query execution. CIDR 5: 225–237.
Google Scholar
Boncz, Peter, Torsten Grust, Maurice Van Keulen, Stefan Manegold, Jan Rittinger, and Jens Teubner. 2006. MonetDB/XQuery: A fast XQuery processor powered by a relational engine. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, 479–490.
Google Scholar
Borkar, Vinayak, Michael Carey, Raman Grover, Nicola Onose, and Rares Vernica. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In 2011 IEEE 27th International Conference on Data Engineering (ICDE), 1151–1162.
Google Scholar
Borthakur, Dhruba, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, et al. 2011. Apache hadoop goes realtime at Facebook. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 1071–1080.
Google Scholar
Bu, Yingyi, Bill Howe, Magdalena Balazinska, and Michael D. Ernst. 2010. HaLoop: Efficient iterative data processing on large clusters. Proceedings of the VLDB Endowment 3(1–2): 285–296.
Article Google Scholar
Buffers, Protocol. 2012. Developer guide. Accessed 2012.
Google Scholar
Calder, Brad, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, et al. 2011. Windows azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, 143–157.
Google Scholar
Cascading. 2011. Cascading: Application platform for enterprise big data. http://www.cascading.org/
Chambers, Craig, Ashish Raniwala, Frances Perry, Stephen Adams, Robert R. Henry, Robert Bradshaw, and Nathan Weizenbaum. 2010. FlumeJava: Easy, efficient data-parallel pipelines. ACM SIGPLAN Notices 45(6): 363–375.
Article Google Scholar
Chandramouli, Badrish, Jonathan Goldstein, and Songyun Duan. 2012. Temporal analytics on big data for web advertising. In 2012 IEEE 28th International Conference on Data Engineering (ICDE), 90–101.
Google Scholar
Chang, Fay, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26(2): 4.
Article Google Scholar
Chen, Songting. 2010. Cheetah: A high performance, custom data warehouse on top of MapReduce. Proceedings of the VLDB Endowment 3(1–2): 1459–1468.
Article Google Scholar
Chen, Hsinchun, Roger H.L. Chiang, and Veda C. Storey. 2012. Business intelligence and analytics: From big data to big impact. MIS Quarterly 36(4): 1165–1188.
Google Scholar
Cohen, Jeffrey, Brian Dolan, Mark Dunlap, Joseph M. Hellerstein, and Caleb Welton. 2009. MAD skills: New analysis practices for big data. Proceedings of the VLDB Endowment 2(2): 1481–1492.
Article Google Scholar
Condie, Tyson, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. NSDI 10(4): 20.
Google Scholar
Corbet, J.C., J. Dean, and M. Epstein. 2012. Spanner: Google’s globally distributed database. In Proceedings of the 10th USENIX conference on operation systems design and implementation, 251–264. Berkeley, CA: USENIX Association.
Google Scholar
Croft, W., Donald Metzler Bruce, and Trevor Strohman. 2010. Search engines: Information retrieval in practice. Reading: Addison-Wesley.
Google Scholar
Dean, J., and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters (2004). Gottfrid, D.: Google, Inc.
Google Scholar
Dean, Jeffrey, and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Communications of the ACM 51(1): 107–113.
Article Google Scholar
DeWitt, David, and Jim Gray. 1992. Parallel database systems: The future of high performance database systems. Communications of the ACM 35(6): 85–98.
Article Google Scholar
DeWitt, David J., Shahram Ghandeharizadeh, Donovan Schneider, Allan Bricker, Hui-I. Hsiao, and Rick Rasmussen. 1990. The Gamma database machine project. IEEE Transactions on Knowledge and Data Engineering 2(1): 44–62.
Article Google Scholar
DeWitt, David J., Jeffrey F. Naughton, Donovan A. Schneider, and Srinivasan Seshadri. 1992. Practical skew handling in parallel joins. Madison: University of Wisconsin-Madison, Computer Sciences Department.
Google Scholar
Dittrich, Jens, Jorge-Arnulfo Quiané-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and Jörg Schad. 2010. Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). Proceedings of the VLDB Endowment 3(1–2): 515–529.
Article Google Scholar
Dittrich, Jens, Jorge-Arnulfo Quiané-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, and Jörg Schad. 2012. Only aggressive elephants are fast elephants. Proceedings of the VLDB Endowment 5(11): 1591–1602.
Article Google Scholar
Ekanayake, Jaliya, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 810–818.
Google Scholar
Eltabakh, Mohamed Y., Yuanyuan Tian, Fatma Özcan, Rainer Gemulla, Aljoscha Krettek, and John McPherson. 2011. CoHadoop: Flexible data placement and its exploitation in Hadoop. Proceedings of the VLDB Endowment 4(9): 575–585.
Article Google Scholar
Färber, Franz, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2012a. SAP HANA database: Data management for modern business applications. ACM SIGMOD Record 40(4): 45–51.
Article Google Scholar
Färber, Franz, Norman May, Wolfgang Lehner, Philipp Große, Ingo Müller, Hannes Rauhe, and Jonathan Dees. 2012b. The SAP HANA database—An architecture overview. IEEE Data Engineering Bulletin 35(1): 28–33.
Google Scholar
Floratou, Avrilia, Jignesh M. Patel, Eugene J. Shekita, and Sandeep Tata. 2011. Column-oriented storage techniques for MapReduce. Proceedings of the VLDB Endowment 4(7): 419–429.
Article Google Scholar
Frankel, Felice, and Rosalind Reid. 2008. Big data: Distilling meaning from data. Nature 455(7209): 30–30.
Article Google Scholar
Franklin, Michael J., Sailesh Krishnamurthy, Neil Conway, Alan Li, Alex Russakovsky, and Neil Thombre. 2009. Continuous analytics: Rethinking query processing in a network-effect world. In CIDR.
Google Scholar
George, Lars. 2011. HBase: The definitive guide. USA: O’Reilly Media, Inc.
Google Scholar
Ghemawat, Sanjay, Howard Gobioff, and Shun-Tak Leung. 2003. The google file system. ACM SIGOPS Operating Systems Review 37(5): 29–43.
Article Google Scholar
Greenplum. 2013. Pivotal greenplum database. Accessed 2013. http://www.pivotal.io/big-data/pivotal-greenplum-database
Grund, Martin, Philippe Cudré-Mauroux, Jens Krüger, Samuel Madden, and Hasso Plattner. 2012. An overview of HYRISE-a main memory hybrid storage engine. IEEE Data Engineering Bulletin 35(1): 52–57.
Google Scholar
Hausenblas, Michael, and Jacques Nadeau. 2013. Apache drill: Interactive ad-hoc analysis at scale. Big Data 1(2): 100–104.
Article Google Scholar
He, Yongqiang, Rubao Lee, Yin Huai, Zheng Shao, Namit Jain, Xiaodong Zhang, and Zhiwei Xu. 2011. RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In 2011 IEEE 27th International Conference on Data Engineering (ICDE), 1199–1208.
Google Scholar
Herodotou, Herodotos, Nedyalko Borisov, and Shivnath Babu. 2011. Query optimization techniques for partitioned tables. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 49–60.
Google Scholar
Hoffman, Steve. 2015. Apache flume: Distributed log collection for hadoop. Birmingham: Packt Publishing.
Google Scholar
Hsiao, Hui-I, and David J. DeWitt. 1990. Chained declustering: A new availability strategy for multiprocessor database machines. Madison: University of Wisconsin-Madison, Computer Sciences Department.
Google Scholar
IBM Corporation. 2007. IBM knowledge center: Partitioned tables. Accessed 2007. http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.partition.doc/doc/c0021560.html
IBM Netezza. 2012. IBM Netezza data warehouse appliances. Accessed 2012. http://www-01.ibm.com/software/data/netezza/
Idreos, Stratos, Fabian Groffen, Niels Nes, Stefan Manegold, K. Sjoerd Mullender, and Martin L. Kersten. 2012. MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Engineering Bulletin 35(1): 40–45.
Google Scholar
Infobright. 2013. Infobright—Analytic database for the internet of things. Accessed 2013. http://www.infobright.com/
Isard, Michael, and Yuan Yu. 2009. Distributed data-parallel computing using a high-level programming language. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, 987–994.
Google Scholar
Isard, Michael, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. 2007. Dryad: Distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review 41(3): 59–72.
Article Google Scholar
Islam, Mohammad, Angelo K. Huang, Mohamed Battisha, Michelle Chiang, Santhosh Srinivasan, Craig Peters, Andreas Neumann, and Alejandro Abdelnur. 2012. Oozie: Towards a scalable workflow management system for hadoop. In Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 4.
Google Scholar
Kemper, Alfons, Thomas Neumann, Florian Funke, Viktor Leis, and Henrik Mühe. 2012. HyPer: Adapting columnar main-memory data management for transactional and query processing. IEEE Data Engineering Bulletin 35(1): 46–51.
Google Scholar
KFS. 2013. Kosmos distributed file system. Accessed 2013. http://code.google.com/p/kosmosfs/
Lakshman, Avinash, and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review 44(2): 35–40.
Article Google Scholar
Lam, Wang, Lu Liu, S. T. S. Prasad, Anand Rajaraman, Zoheb Vacheri, AnHai Doan. 2012. Muppet: MapReduce-style processing of fast data. Proceedings of the VLDB Endowment 5(12): 1814–1825.
Google Scholar
Lamb, Andrew, Matt Fuller, Ramakrishna Varadarajan, Nga Tran, Ben Vandiver, Lyric Doshi, and Chuck Bear. 2012. The vertica analytic database: C-store 7 years later. Proceedings of the VLDB Endowment 5(12): 1790–1801.
Article Google Scholar
Laney, Doug. 2001. 3D data management: Controlling data volume, velocity and variety. META Group Research Note 6: 70.
Google Scholar
Lee, Rubao, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Xiaodong Zhang. 2011. Ysmart: Yet another SQL-to-MapReduce translator. In 2011 31st International Conference on Distributed Computing Systems (ICDCS), 25–36.
Google Scholar
Lee, George, Jimmy Lin, Chuang Liu, Andrew Lorek, and Dmitriy Ryaboy. 2012. The unified logging infrastructure for data analytics at Twitter. Proceedings of the VLDB Endowment 5(12): 1771–1780.
Article Google Scholar
Lin, Yuting, Divyakant Agrawal, Chun Chen, Beng Chin Ooi, and Sai Wu. 2011. Llama: Leveraging columnar storage for scalable join processing in the mapreduce framework. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, 961–972.
Google Scholar
Low, Yucheng, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5(8): 716–727.
Article Google Scholar
MacNicol, Roger, and Blaine French. 2004. Sybase IQ multiplex-designed for analytics. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, 1227–1230. Seoul: VLDB Endowment.
Google Scholar
Malewicz, Grzegorz, Matthew H. Austern, Aart JC Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 135–146.
Google Scholar
MapR. 2013. MapR file system. Accessed 2013. http://www.mapr.com/products/apache-hadoop
Mehta, Manish, and David J. DeWitt. 1997. Data placement in shared-nothing parallel database systems. The VLDB Journal—The International Journal on Very Large Data Bases 6(1): 53–72.
Article Google Scholar
Meijer, Erik, Brian Beckman, and Gavin Bierman. 2006. Linq: Reconciling object, relations and xml in the .net framework. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, 706–706.
Google Scholar
Melnik, Sergey, Andrey Gubarev, Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, and Theo Vassilakis. 2010. Dremel: Interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment 3(1–2): 330–339.
Article Google Scholar
Morales, Tony. 2007. Oracle database VLDB and partitioning guide 11 g release 1 (11.1). Oracle, July 2007.
Google Scholar
Neumeyer, Leonardo, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed stream computing platform. In 2010 IEEE International Conference on Data Mining Workshops (ICDMW), 170–177.
Google Scholar
Nykiel, Tomasz, Michalis Potamias, Chaitanya Mishra, George Kollios, and Nick Koudas. 2010. MRShare: Sharing across multiple queries in MapReduce. Proceedings of the VLDB Endowment 3(1–2): 494–505.
Article Google Scholar
Olston, Christopher, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. 2008. Pig latin: A not-so-foreign language for data processing. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 1099–1110.
Google Scholar
Ovsiannikov, Michael, Silvius Rus, Damian Reeves, Paul Sutter, Sriram Rao, and Jim Kelly. 2013. The quantcast file system. Proceedings of the VLDB Endowment 6(11): 1092–1101.
Article Google Scholar
ParAccel. 2013. ParAccel analytic platform. Accessed 2013. http://www.paraccel.com/
Rabkin, Ariel, and Randy H. Katz. 2010. Chukwa: A system for reliable large-scale log collection. LISA 10: 1–15.
Google Scholar
Rao, Jun, Chun Zhang, Nimrod Megiddo, and Guy Lohman. 2002. Automating physical database design in a parallel database. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 558–569.
Google Scholar
Shvachko, Konstantin, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 1–10.
Google Scholar
Stonebraker, Mike, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, et al. 2005. C-store: A column-oriented DBMS. In Proceedings of the 31st International Conference on Very Large Data Bases, 553–564. Seoul: VLDB Endowment.
Google Scholar
Storm, Apache. 2013. Storm, distributed and fault-tolerant real-time computation.
Google Scholar
Sumbaly, Roshan, Jay Kreps, and Sam Shah. 2013. The big data ecosystem at linkedin. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 1125–1134.
Google Scholar
Talmage, Ron. 2009. Partitioned table and index strategies using SQL server 2008. MSDN Library, March 2009.
Google Scholar
Teradata. 2012. Teradata enterprise data warehouse. Accessed 2012. http://www.teradata.com
Thusoo, Ashish, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and Raghotham Murthy. 2009. Hive: A warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2(2): 1626–1629.
Article Google Scholar
Thusoo, Ashish, Zheng Shao, Suresh Anthony, Dhruba Borthakur, Namit Jain, Joydeep Sen Sarma, Raghotham Murthy, and Hao Liu. 2010. Data warehousing and analytics infrastructure at Facebook. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 1013–1020.
Google Scholar
Traverso, Martin. 2013. Presto: Interacting with petabytes of data at Facebook. Retrieved February 4, 2014.
Google Scholar
Wanderman-Milne, Skye, and Li Nong. 2014. Runtime code generation in cloudera impala. IEEE Data Eng. Bull. 37(1): 31–37.
Google Scholar
Weil, Sage A., Scott A. Brandt, Ethan L. Miller, Darrell DE Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, 307–320. Berkeley, CA: USENIX Association.
Google Scholar
White, Tom. 2010. Hadoop: The definitive guide. Sunnyvale, CA: Yahoo.
Google Scholar
Wu, Lili, Roshan Sumbaly, Chris Riccomini, Gordon Koo, Hyung Jin Kim, Jay Kreps, and Sam Shah. 2012. Avatara: Olap for web-scale analytics products. Proceedings of the VLDB Endowment 5(12): 1874–1877.
Article Google Scholar
Xin, Reynold S., Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013a. Graphx: A resilient distributed graph system on spark. In First International Workshop on Graph Data Management Experiences and Systems, 2.
Google Scholar
Xin, Reynold S., Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2013b. Shark: SQL and rich analytics at scale. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, 13–24.
Google Scholar
Zaharia, Matei, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, 2–2. Berkeley, CA: USENIX Association.
Google Scholar
Zhang, Yanfeng, Qixin Gao, Lixin Gao, and Cuirong Wang. 2011. Priter: A distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing, 13.
Google Scholar
Zhou, Jingren, Nicolas Bruno, Ming-Chuan Wu, Per-Ake Larson, Ronnie Chaiken, and Darren Shakib. 2012. SCOPE: Parallel databases meet MapReduce. The VLDB Journal—The International Journal on Very Large Data Bases 21(5): 611–636.
Article Google Scholar
Zukowski, Marcin, and Peter A. Boncz. 2012. Vectorwise: Beyond column stores. IEEE Data Engineering Bulletin 35(1): 21–27.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Computer Engineering and Informatics (EECEI), Cyprus University of Technology, Limassol, Cyprus
Herodotos Herodotou

Authors

Herodotos Herodotou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Herodotos Herodotou .

Editor information

Editors and Affiliations

Department of Information Systems and Technology Management, George Washington University, Washington, District of Columbia, USA
Elias G. Carayannis
School of Business, American University in Dubai School of Business, Dubai, United Arab Emirates
Stavros Sindakis

Copyright information

About this chapter

Cite this chapter

Herodotou, H. (2017). Business Intelligence and Analytics: Big Systems for Big Data. In: Carayannis, E., Sindakis, S. (eds) Analytics, Innovation, and Excellence-Driven Enterprise Sustainability. Palgrave Studies in Democracy, Innovation, and Entrepreneurship for Growth. Palgrave Macmillan, New York. https://doi.org/10.1057/978-1-137-37879-8_2

Download citation

DOI: https://doi.org/10.1057/978-1-137-37879-8_2
Published: 20 April 2017
Publisher Name: Palgrave Macmillan, New York
Print ISBN: 978-1-137-39301-2
Online ISBN: 978-1-137-37879-8
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics