Big Data Tools and Platforms

Mazumder, Sourav

doi:10.1007/978-3-319-27763-9_2

Sourav Mazumder³

6623 Accesses
11 Citations

Abstract

The fast evolving Big Data Tools and Platforms space has given rise to various technologies to deal with different Big Data use cases. However, because of the multitude of the tools and platforms involved it is often difficult for the Big Data practitioners to understand and select the right tools for addressing a given business problem related to Big Data. In this chapter we cover an introductory discussion to the various Big Data Tools and Platforms with the aim of providing necessary breadth and depth to the Big Data practitioner so that they can have a reasonable background to start with to support the Big Data initiatives in their organizations. We start with the discussion of common Technical Concepts and Patterns typically used by the core Big Data Tools and Platforms. Then we delve into the individual characteristics of different categories of the Big Data Tools and Platforms in detail. Then we also cover the applicability of the various categories of Big Data Tools and Platforms to various enterprise level Big Data use cases. Finally, we discuss the future works happening in this space to cover the newer patterns, tools and platforms to be watched for implementation of Big Data use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache Software Foundation. http://en.wikipedia.org/wiki/Apache_Software_Foundation. Accessed 06 Aug 2015
Apache Projects Directory. https://projects.apache.org/. Accessed 06 Aug 2015
Apache Incubator. http://incubator.apache.org/. Accessed 06 Aug 2015
Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Sixth symposium on operating system design and implementation, San Francisco, CA, December 2004
Google Scholar
Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: SOSP’03 Proceedings of the nineteenth ACM symposium on operating systems principles, pp 29–43, October 19–22, 2003, Bolton Landing, New York, USA
Google Scholar
Woodie A (2014) Yahoo: we run the whole company on Hadoop. In: Datanami, http://www.datanami.com/2014/06/04/yahoo-run-whole-company-hadoop/. Accessed 06 Aug 2015
HDFS Users Guide. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html. Accessed 06 Aug 2015
Hadoop Map Reduce. http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html. Accessed 06 Aug 2015
Saha B (2013) Philosophy behind YARN Resource Management. http://hortonworks.com/blog/philosophy-behind-yarn-resource-management/
Murthy A (2012) Apache Hadoop YARN – concepts and applications. http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/. Accessed 06 Aug 2015
Apache Spark. https://spark.apache.org/. Accessed 06 Aug 2015
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. University of California, Berkeley, CA
Google Scholar
Dean J, Ghemawat S (2004) Parallel execution. In: MapReduce: simplified data processing on large clusters. http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0008.html. Accessed 06 Aug 2015
Apache Storm. https://storm.apache.org/. Accessed 06 Aug 2015 https://storm.apache.org/documentation/Concepts.html
Apache Drill. http://drill.apache.org/. Accessed 06 Aug 2015
Apache Drill Architecture. http://drill.apache.org/architecture/. Accessed 06 Aug 2015
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. University of California, Berkeley, CA
Google Scholar
Apache Drill Architecture. http://drill.apache.org/architecture/
Tez. http://tez.apache.org/. Accessed 06 Aug 2015
HDFS Architecture. In: HDFS Architecture Guide. http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 06 Aug 2015
Apache Hive. https://cwiki.apache.org/confluence/display/Hive/Home. Accessed 06 Aug 2015
Gates A, Bains R (2014) Stinger.next: Enterprise SQL at Hadoop Scale with Apache Hive. http://hortonworks.com/blog/stinger-next-enterprise-sql-hadoop-scale-apache-hive/. Accessed 06 Aug 2015
Zhan X, Ho S (2015). Hive on Spark. https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started. Accessed 22 Jan 2016
Binary JSON. http://bsonspec.org/. Accessed 06 Aug 2015
Revolution R. http://www.revolutionanalytics.com/#homepage-section-1155. Accessed 06 Aug 2015
Big R. http://www-01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/t_overview_bigr.html. Accessed 06 Aug 2015
YARN Architecture. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 06 Aug 2015
SerDe. https://cwiki.apache.org/confluence/display/Hive/SerDe. Accessed 06 Aug 2015
LanguageManual UDF. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF. Accessed 06 Aug 2015
HiveServer2. https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-HiveServer2. Accessed 06 Aug 2015
Apache Pig. https://pig.apache.org/. Accessed 06 Aug 2015
Pig Latin Basics. http://pig.apache.org/docs/r0.14.0/basic.html. Accessed 06 Aug 2015
Pig on Spark. https://cwiki.apache.org/confluence/display/PIG/Pig+on+Spark. Accessed 22 Jan 2016
Apache HCatalog. https://cwiki.apache.org/confluence/display/Hive/HCatalog. Accessed 06 Aug 2015
Apache WebHCat. https://cwiki.apache.org/confluence/display/Hive/WebHCat. Accessed 06 Aug 2015
Apache Flume. https://flume.apache.org/. Accessed 06 Aug 2015
MongoDB Sharding. http://docs.mongodb.org/manual/core/sharding-introduction/. Accessed 06 Aug 2015
Apache Sqoop. http://sqoop.apache.org/. Accessed 06 Aug 2015
X.509. In: Wikipedia. https://en.wikipedia.org/wiki/X.509. Accessed 06 Aug 2015
Apache Oozie. http://oozie.apache.org/. Accessed 06 Aug 2015
XPDL. In: Wikipedia. http://en.wikipedia.org/wiki/XPDL. Accessed 06 Aug 2015
Vormetric Data Security Platform. http://www.vormetric.com/. Accessed 06 Aug 2015
Apache ZooKeeper. https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index. Accessed 06 Aug 2015
Linux Unified Key Setup. In: Wikipedia. https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup. Accessed 06 Aug 2015
Apache Slider. http://slider.incubator.apache.org/. Accessed 06 Aug 2015
Apache Knox. https://knox.apache.org/. Accessed 06 Aug 2015
Apache Ambari. https://ambari.apache.org/. Accessed 06 Aug 2015
Apache Giraph. http://giraph.apache.org/. Accessed 06 Aug 2015
Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010). Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. http://dl.acm.org/citation.cfm?id=1807184
Valiant LG (1990). A bridging model for parallel computation. Commun ACM 33(8):103–111
Google Scholar
IBM Infosphere Guardium Data Encryption. http://www-03.ibm.com/software/products/en/infosphere-guardium-data-encryption. Accessed 06 Aug 2015
MongoDB. http://www.mongodb.com/. Accessed 06 Aug 2015
BitLocker Drive Encryption. http://windows.microsoft.com/en-us/windows-vista/bitlocker-drive-encryption-overview/. Accessed 06 Aug 2015
Apache Cassandra. http://cassandra.apache.org/. Accessed 06 Aug 2015
Apache Hbase. http://hbase.apache.org/. Accessed 06 Aug 2015
Britton Lee, Inc. In: Wikipedia. https://en.wikipedia.org/wiki/Britton_Lee,_Inc. Accessed 06 Aug 2015
Snijders C, Matzat U, Reips U (2012) Big data: big gaps of knowledge in the field of internet science. Int J Internet Sci 7(1):1–5
Google Scholar
NoSQL. http://nosql-database.org/
Cockroach Labs. http://cockroachdb.org/. Accessed 06 Aug 2015
Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman J, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh W, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D (2012) Spanner: Google’s globally-distributed database. In: Tenth symposium on operating system design and implementation, Hollywood, CA, October 2012
Google Scholar
IBM Cloudant. https://cloudant.com/. Accessed 06 Aug 2015
Apache Nutch. http://nutch.apache.org/. Accessed 06 Aug 2015
Apache Parquet. http://parquet.apache.org/. Accessed 06 Aug 2015
Leverenz L (2015). Language Manual of ORC. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC. Accessed 06 Aug 2015
Apache Avro. http://avro.apache.org/docs/1.3.0/. Accessed 06 Aug 2015
Sequence File. http://wiki.apache.org/hadoop/SequenceFile. Accessed 06 Aug 2015
Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T (2010) Dremel: interactive analysis of web-scale datasets. In: Proceedings of the 36th international conference on very large data bases, 330–339, September 13–17, 2010, Singapore.
Google Scholar
Massively Parallel. In: Wikipedia. http://en.wikipedia.org/wiki/Massively_parallel_%28computing%29. Accessed 06 Aug 2015
Apache Sentry. http://sentry.incubator.apache.org/. Accessed 06 Aug 2015
Apache Ranger. http://ranger.incubator.apache.org/. Accessed 06 Aug 2015
Apache Falcon. http://falcon.apache.org/index.html. Accessed 06 Aug 2015
Apache Atlas Proposal. https://wiki.apache.org/incubator/AtlasProposal. Accessed 06 Aug 2015
ODPi. https://www.odpi.org/. Accessed 22 Jan 2016
Amazon EMR. http://aws.amazon.com/elasticmapreduce/. Accessed 06 Aug 2015
IBM’s BigInsight for Apache Hadoop on Bluemix. https://console.ng.bluemix.net/?ace_base=true/#/store/cloudOEPaneId=store&serviceOfferingGuid=aff58576-c0fc-4d9a-a57d-c6dd492bede1&fromCatalog=true. Accessed 06 Aug 2015
Qubole’s Hadoop As A Service. http://www.qubole.com/hadoop-as-a-service/. Accessed 06 Aug 2015
HDInsight on Microsoft Azure. http://azure.microsoft.com/en-us/services/hdinsight. Accessed 06 Aug 2015
Big Data Computing in the HP Cloud. http://www.hpcloud.com/solutions/hadoop. Accessed 06 Aug 2015
Hadoop on Google Compute Engine. https://cloud.google.com/solutions/hadoop/. Accessed 06 Aug 2015
Altiscale Hadoop As A Service. https://www.altiscale.com/. Accessed 06 Aug 2015
Oracle Big Data Appliance. https://www.oracle.com/engineered-systems/big-data-appliance/index.html. Accessed 06 Aug 2015
Avnet Hadoop Appliance. http://news.avnet.com/index.php?s=20295&item=127070. Accessed 06 Aug 2015
Microsoft Analytics Platform System. http://www.microsoft.com/en-us/server-cloud/products/analytics-platform-system/Overview.aspx. Accessed 06 Aug 2015
EMC Data Computing Appliance. http://pivotal.io/big-data/emc-dca. Accessed 06 Aug 2015
SeaMicro Fabric Compute System. http://www.seamicro.com/sites/default/files/SM_DS06_v2.1.pdf. Accessed 06 Aug 2015
SGI InfiniteData Cluster. https://www.sgi.com/products/servers/infinitedata_cluster/. Accessed 06 Aug 2015
Cray Cluster Supercomputer for Hadoop. http://www.cray.com/Assets/PDF/products/cs/CS300HadoopBrochure.pdf. Accessed 06 Aug 2015
Cisco Unified Computing System. http://www.cisco.com/c/dam/en/us/solutions/collateral/data-center-virtualization/unified-computing/at_a_glance_c45-523181.pdf. Accessed 06 Aug 2015
Apache Flink. https://flink.apache.org/. Accessed 06 Aug 2015
Apache Kafka. http://kafka.apache.org/documentation.html#introduction. Accessed 06 Aug 2015
Apache Solr. http://lucene.apache.org/solr/. Accessed 06 Aug 2015
Apache Lucene. https://lucene.apache.org/. Accessed 06 Aug 2015
Elastic Search. https://www.elastic.co/products/elasticsearch. Accessed 06 Aug 2015
Sphynx. http://sphinxsearch.com/. Accessed 06 Aug 2015
Found: Elasticsearch As A Service. https://www.found.no/. Accessed 06 Aug 2015
Snaplogic. http://www.snaplogic.com/. Accessed 06 Aug 2015
Apache Mahout. http://mahout.apache.org/. Accessed 06 Aug 2015
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. J ACM Trans Comput Syst (TOCS) 26(2)
Google Scholar
Microsoft Azure Stream Analytics. http://azure.microsoft.com/en-us/services/stream-analytics/. Accessed 06 Aug 2015
IBM Geospatial Analytics. https://console.ng.bluemix.net/?ace_base=true/#/store/cloudOEPaneId=store&fromCatalog=true&serviceOfferingGuid=f5c45150-8023-4d3e-a3d4-5a3a8ca8e407. Accessed 06 Aug 2015
Amazon Kinesis. http://aws.amazon.com/kinesis/. Accessed 06 Aug 2015
Natural Language Generation. In: Wikipedia. http://en.wikipedia.org/wiki/Natural_language_generation
Quill. http://www.narrativescience.com/quill. Accessed 06 Aug 2015
Wordsmith. http://automatedinsights.com/wordsmith/. Accessed 06 Aug 2015
The Arria NLG Engine. http://www.arria.com/platform.php. Accessed 06 Aug 2015
George L (2009) HBase Architecture 101 – Storage. http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. Accessed 06 Aug 2015
IBM Platform Computing. http://www-03.ibm.com/systems/platformcomputing/products/symphony/. Accessed 06 Aug 2015
Cascading. http://www.cascading.org/. Accessed 06 Aug 2015
Herman Hollerith. Columbia University, Computing History. http://www.columbia.edu/cu/computinghistory/hollerith.html. Accessed 06 Aug 2015
Internet of Things. In: Wikipedia. http://en.wikipedia.org/wiki/Internet_of_Things
Reed J (2015) Hadoop survey offers insight into investment, adoption. DataInformed. http://data-informed.com/hadoop-survey-offers-insight-into-investment-adoption/
Informatica Cloud Edition. https://www.informatica.com/products/cloud-integration/editions-and-pricing/us-pricing.html#fbid=SnXuN8qeDld. Accessed 06 Aug 2015
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. In: SOSP’07. http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
Simple Authentication and Security Layer. In: Wikipedia. https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer. Accessed 06 Aug 2015
Aerospike. http://www.aerospike.com/. Accessed 06 Aug 2015
Hazelcast. http://hazelcast.org/. Accessed 06 Aug 2015
Pivotal GemFire. http://pivotal.io/big-data/pivotal-gemfire. Accessed 06 Aug 2015
Amazon Dynamo DB. https://aws.amazon.com/dynamodb/. Accessed 06 Aug 2015
ObjectRocket. http://objectrocket.com/. Accessed 06 Aug 2015/
Apache Ignite. https://ignite.incubator.apache.org/. Accessed 06 Aug 2015
Oracle Coherence. http://www.oracle.com/technetwork/middleware/coherence/overview/index-087514.html. Accessed 06 Aug 2015
Oracle TimesTen In-Memory Database. http://www.oracle.com/us/products/database/timesten/overview/index.html. Accessed 06 Aug 2015
IBM DB2 with BLU Acceleration. http://www.ibmbluhub.com/. Accessed 06 Aug 2015
SAP HANA. http://hana.sap.com/abouthana.html. Accessed 06 Aug 2015
EXASOL. http://www.exasol.com/en/products/exasolution/. Accessed 06 Aug 2015
SAP HANA Cloud Platform. http://hcp.sap.com/index.html. Accessed 06 Aug 2015
IBM DashDB Cloud Data Warehouse Service. http://www-01.ibm.com/software/data/dashdb/. Accessed 06 Aug 2015
EXACloud. http://www.exasol.com/en/products/exacloud/. Accessed 06 Aug 2015
Google Cloud Bigtable. https://cloud.google.com/bigtable/docs/. Accessed 06 Aug 2015
Spark Streaming. https://spark.apache.org/streaming/. Accessed 06 Aug 2015
Teradata. http://www.teradata.com/?LangType=1033l. Accessed 06 Aug 2015
Apache Phoenix. http://phoenix.apache.org/. Accessed 06 Aug 2015
Mazumder S, Dhar S (2015) Hadoop_as_Big_Data_Operating_System__The_Emerging_ Approach_for_Managing_Challenges_of_Enterprise_Big_Data_Platform. Research Gate. http://www.researchgate.net/publication/274713261
Sort Benchmark Home Page. http://sortbenchmark.org/. Accessed 06 Aug 2015
Harris D (2014) Databricks demolishes big data benchmark to prove Spark is fast on disk, too. In: GIGAOM Research. https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/. Accessed 06 Aug 2015
Gualtieri M, Yuhanna N, Kisker H, Murphy D (2014) The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014. http://www.forrester.com/The+Forrester+Wave+Big+Data+Hadoop+Solutions+Q1+2014/fulltext/-/E-RES112461
Penn B (2014) Comparing MapR-FS and HDFS NFS and Snapshots. https://www.mapr.com/blog/comparing-mapr-fs-and-hdfs-nfs-and-snapshots#.VWxfXWMSknE. Accessed 06 Aug 2015
IBM Spectrum Scale V4.1.1 delivers software-defined storage for cloud, big data and analytics, and data-intensive technical workflows. http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=897&letternum=ENUS215-148. Accessed 06 Aug 2015
IBM Netezza. http://www-01.ibm.com/software/data/netezza/. Accessed 06 Aug 2015
Gorda B (2014) Intel^® Enterprise Edition for Lustre* Software: Simpler, Smarter. http://www.intel.com/content/www/us/en/software/isc-2014-intel-enterprise-edition-lustre-software-video.html. http://www.emc.com/collateral/hardware/solution-overview/h8319-scale-out-nas-greenplum-hd-so.pdf. Accessed 06 Aug 2015
Hadoop On EMC ISILON Scale-Out NAS. http://www.emc.com/collateral/software/white-papers/h10528-wp-hadoop-on-isilon.pdf. Accessed 06 Aug 2015
Apache Tachyon. http://tachyon-project.org/. Accessed 06 Aug 2015
FoundationDB. https://foundationdb.com/. Accessed 06 Aug 2015
ACID. In: Wikipedia. http://en.wikipedia.org/wiki/ACID. Accessed 06 Aug 2015
Talend. https://www.talend.com/. Accessed 06 Aug 2015
Pentaho. http://wiki.pentaho.com/display/COM/Community+Edition+Downloads. Accessed 06 Aug 2015
Lu H, Kian-Lee T (1992) Dynamic and load-balanced task- oriented database query processing in parallel systems. In: Proceedings of the 3rd international conference on extending database technology, 357–372
Google Scholar
Infosphere Streams. http://www-03.ibm.com/software/products/en/infosphere-streams. Accessed 06 Aug 2015
TIBCO StreamBase. http://www.tibco.com/products/event-processing/complex-event-processing/streambase-complex-event-processing. Accessed 06 Aug 2015
Software AG APAMA. http://www.softwareag.com/corporate/products/apama_webmethods/analytics/overview/default.asp. Accessed 06 Aug 2015
Sqlstream. http://www.sqlstream.com/. Accessed 06 Aug 2015
Cisco Tidal. http://www.cisco.com/c/en/us/products/cloud-systems-management/tidal-enterprise-scheduler/index.html. Accessed 06 Aug 2015
Comer D (1979) Ubiquitous B-Tree. ACM Comput Surv (CSUR) Surv 11(2):121–137
Article MATH Google Scholar
Manning CD, Raghavan P, Schütze H (2008) A first take at building an inverted index. In: Introduction to information retrieval, Cambridge University Press, New York, USA
Google Scholar
Binari Radix Indexes. In: Wikipedia. https://en.wikipedia.org/wiki/Radix_tree. Accessed 06 Aug 2015
O’Neil E, O’Neil P, Wu K (2007) Bitmap index design choices and their performance implications. In: IDEAS’07 proceedings of the 11th international database engineering and applications symposium, 72–84
Google Scholar
Broder A, Mitzenmacher M (2005) Network applications of bloom filters: A survey. Internet Math 1(4):485–509
Article MathSciNet MATH Google Scholar
Oracle Exadata. https://www.oracle.com/engineered-systems/exadata/index.html. Accessed 06 Aug 2015
Design a pluggable interface to place replicas of blocks in HDFS. https://issues.apache.org/jira/browse/HDFS-385. Accessed 06 Aug 2015
Zero loss HDFS data replication for multiple datacenters. https://issues.apache.org/jira/browse/HDFS-5442. Accessed 06 Aug 2015
DistCp Version2 Guide. http://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/DistCp.html. Accessed 06 Aug 2015
Dittrich J, Richter S, Schuh S (2013) Efficient OR Hadoop: why not both? Datenbank-Spektrum 13(1):17–22
Article Google Scholar
Gankidi VR, Teletia N, Patel JM, Halverson A, DeWitt DJ (2014) Indexing HDFS Data in PDW: splitting the data from the index. Proc VLDB Endow 7(13)
Google Scholar
Liao H, Han J, Fang J (2010) Multi-dimensional Index on Hadoop distributed file system. In: Fifth IEEE international conference on networking, architecture, and storage
Google Scholar
Amplab. https://amplab.cs.berkeley.edu/. Accessed 06 Aug 2015
Apache Mesos. http://mesos.apache.org/. Accessed 06 Aug 2015
IBM Information Server Suite. https://www-01.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.productization.iisinfsv.overview.doc/topics/cisoproductsinthesuite.html. Accessed 06 Aug 2015
Informatica. https://www.informatica.com/. Accessed 06 Aug 2015
Resource Management for MongoDB. http://jsonstudio.com/resource-management-for-mongodb/. Accessed 06 Aug 2015
Demo: Migrating MongoDB data with Mesos and Flocker. https://mesosphere.com/blog/2015/05/21/demo-migrating-mongodb-data-with-mesos-and-powerstrip/. Accessed 06 Aug 2015
Nachbar E (2014) Cassandra on Mesos – Scalable Enterprise Storage. https://mesosphere.com/blog/2014/02/12/cassandra-on-mesos-scalable-enterprise-storage/. Accessed 06 Aug 2015
Kamenov DZ (2012) Monitoring HBase. http://www.monitis.com/blog/2012/03/28/monitoring-hbase/. Accessed 06 Aug 2015
Hannibal Wiki. https://github.com/sentric/hannibal/wiki. Accessed 06 Aug 2015
Lai M, Koontz E, Purtell A (2012) Coprocessor Introduction. https://blogs.apache.org/hbase/entry/coprocessor_introduction. Accessed 06 Aug 2015
Krompass S, Dayal U, Kuno HA, Kemper A (2007) Dynamic workload management for very large data warehouses: juggling feathers and bowling balls. In: Proceedings of the 33rd international conference on very large data bases, 1105–1115
Google Scholar
Krompass S, Kuno HA, Wiener JL, Wilkinson K, Dayal U, Kemper A (2009) Managing long-running queries. In: Proceedings of the 13th international conference on extending database technology, 132–143
Google Scholar
Pang H, Carey MJ, Livny M (1995) Multiclass query scheduling in real-time database systems. IEEE Trans Knowl Data Eng 7(4):533–551
Article Google Scholar
Brown KP, Mehta M, Carey MJ, Livny M (1994) Towards automated performance tuning for complex workloads. In: Proceedings of the 20th international conference on very large data bases, 72–84
Google Scholar
Chaudhuri S, König AC, Narasayya VR (2004) SQLCM: A continuous monitoring framework for relational database engines. In: Proceedings of the 20th IEEE international conference on data engineering, 473–484
Google Scholar
Potter T (2014) Solr on YARN. In: Lucidworks. https://lucidworks.com/blog/solr-yarn/. Accessed 06 Aug 2015
SPM – Performance Monitoring & Alerting. http://sematext.com/spm/. Accessed 06 Aug 2015
Elasticsearch on YARN. https://www.elastic.co/guide/en/elasticsearch/hadoop/current/es-yarn.html. Accessed 06 Aug 2015
Elasticsearch on Mesos. https://github.com/mesos/elasticsearch. Accessed 06 Aug 2015
Health and Performance Monitoring. https://www.elastic.co/guide/en/elasticsearch/client/community/current/health.html. Accessed 06 Aug 2015
Shield | Security for Elasticsearch. https://www.elastic.co/products/shield. Accessed 06 Aug 2015
High availability – Built-in Mirroring. http://sphinxsearch.com/blog/2013/04/01/high-availability-built-in-mirroring/. Accessed 06 Aug 2015
Sphinx Tools beta. https://tools.sphinxsearch.com/. Accessed 06 Aug 2015
Security Onion. http://blog.securityonion.net/2015/05/sphinxsearch-219.html. Accessed 06 Aug 2015
Monitoring Tachyon. https://tachyon.atlassian.net/browse/TACHYON-84
Li H, Ghodsi A, Zaharia M, Shenker S, Stoica I (2014) Tachyon: reliable, memory speed storage for cluster computing frameworks. In: SoCC’14, Seattle WA, 3–5 Nov 2014
Google Scholar
Pivotal Greenplum. http://pivotal.io/big-data/pivotal-greenplum-database. Accessed 06 Aug 2015
Klpoo R (2014) Netezza Zone Maps and I/O Avoidance. In: Database Fog Blog. http://skylandtech.net/2014/04/25/netezza-zone-maps-and-io-avoidance/
Centralized cache management in HDFS. https://issues.apache.org/jira/browse/HDFS-4949. Accessed 06 Aug 2015
Support memory as a storage medium. https://issues.apache.org/jira/browse/HDFS-5851. Accessed 06 Aug 2015
HDFS Federation. http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/Federation.html. Accessed 06 Aug 2015
Sung M (2000) SIMD parallel processing. In: 6.911 Architecture Anonymous
Google Scholar
R. https://www.r-project.org/about.html. Accessed 06 Aug 2015
RHadoop. https://github.com/RevolutionAnalytics/RHadoop/wiki. Accessed 06 Aug 2015
Spark DataFrames. http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes. Accessed 06 Aug 2015
BDAS, the Berkeley Data Analytics Stack: https://amplab.cs.berkeley.edu/software/. Accessed 06 Aug 2015
Succinct. http://succinct.cs.berkeley.edu/wp/wordpress/. Accessed 06 Aug 2015
Splash. http://zhangyuc.github.io/splash/. Accessed 06 Aug 2015
Spark ML Programming Guide. https://spark.apache.org/docs/latest/ml-guide.html#main-concepts. Accessed 06 Aug 2015
BlinkDB. http://blinkdb.org/. Accessed 06 Aug 2015
Sampleclean. http://sampleclean.org/. Accessed 06 Aug 2015
Crankshaw D (2014) Velox: models in action. https://amplab.cs.berkeley.edu/projects/velox/. Accessed 06 Aug 2015
Databricks Spark As A Service. https://databricks.com/product/databricks. Accessed 06 Aug 2015
Qubole Spark As A Service. http://www.qubole.com/apache-spark-as-a-service/. Accessed 06 Aug 2015
IBM Spark As A Service. http://www.spark.tc/beta/. Accessed 06 Aug 2015
Terracota BigMemory. http://terracotta.org/products/bigmemory. Accessed 06 Aug 2015
Apache Thrift. https://thrift.apache.org/. Accessed 06 Aug 2015
Rodriguez A (2008) RESTful Web services: the basics. In: IBM developerWorks. http://www.ibm.com/developerworks/library/ws-restful/
Jupyter. http://jupyter.org/. Accessed 06 Aug 2015
Apache Zeppelin. https://zeppelin.incubator.apache.org/. Accessed 06 Aug 2015
Ullman JD, Aho A (1992) The relational data model. In: Foundations of Computer Science, C edn. http://infolab.stanford.edu/~ullman/focs/ch08.pdf
Edgar F. Codd. In: Wikipedia. https://en.wikipedia.org/wiki/Edgar_F_Codd. Accessed 06 Aug 2015
Ullman JD, Aho A (1992) The graph data model. In: Foundations of computer science, C edn. http://infolab.stanford.edu/~ullman/focs/ch09.pdf
Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance techniques. Newslett ACM SIGMOD 34(3):31–36
Article Google Scholar
Gilbert S, Lynch NA (2012) Perspectives on the CAP theorem. Computer 45(2):30–36
Article Google Scholar
Kamat G, Singh S (2013). Comparisons of compression. In: Hadoop Summit 2013. http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2
DMZ (Demilitarized Zone). In: CCM. http://ccm.net/contents/602-dmz-demilitarized-zone. Accessed 06 Aug 2015
Stonebraker M. The case for shared nothing. University of California, Berkeley, CA
Google Scholar
Chamberlin DD, Boyce RF (1974) SEQUEL: A structured english query language. In: Proceedings of the 1974 ACM SIGFIDET workshop on Data description, access and control, 249–264
Google Scholar
Kerberos: The Network Authentication Protocol. http://web.mit.edu/kerberos/. Accessed 06 Aug 2015
Lightweight Directory Access Protocol (LDAP). In: Wikipedia. https://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol. Accessed 06 Aug 2015
Machine Learning. In: Wikipedia. https://en.wikipedia.org/wiki/Machine_learning. Accessed 06 Aug 2015
Apache System ML. http://systemml.apache.org/. Accessed 19 Jan 2016
Mazumder S (2010) NoSQL in the Enterprise. In: InfoQ. http://www.infoq.com/articles/nosql-in-the-enterprise. Accessed 06 Aug 2015
Zaman Khan RZ, Ali J (2013) Use of DAG in distributed parallel computing. Int J Appl Innov Eng Manag 2(11):81–85
Google Scholar
Analyzing and manipulating big data with Big SQL. http://www-01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/bigsql_analyzingbigdata.html. Accessed 06 Aug 2015
Cloudera Impala. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html. Accessed 06 Aug 2015
Introduction to Massively Parallel Processing (MPP) database. https://dwarehouse.wordpress.com/2012/12/28/introduction-to-massively-parallel-processing-mpp-database/. Accessed 06 Aug 2015
Flume User Guide. https://flume.apache.org/FlumeUserGuide.html. Accessed 06 Aug 2015
Noll MG (2013) Running a Multi-Node Storm Cluster. http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/. Accessed 06 Aug 2015
Sharma A (2014) Apache Kafka: Next Generation Distributed Messaging System. http://www.infoq.com/articles/apache-kafka. Accessed 06 Aug 2015
Apache Solr Architecture. https://www.safaribooksonline.com/library/view/scaling-big-data/9781783281374/ch02s02.html. Accessed 06 Aug 2015
Spark Cluster Overview. https://spark.apache.org/docs/1.0.1/cluster-overview.html. Accessed 06 Aug 2015
Mesos Architecture. http://mesos.apache.org/documentation/latest/mesos-architecture/. Accessed 06 Aug 2015
Amazon Redshift. https://aws.amazon.com/redshift/. Accessed 06 Aug 2015
Teradata Cloud. http://www.teradata.com/cloud-overview/?LangType=1033&LangSelect=true. Accessed 06 Aug 2015
IBM Watson Explorer. http://www.ibm.com/smarterplanet/us/en/ibmwatson/explorer.html. Accessed 06 Aug 2015
Oracle Secure Enterprise Search. http://www.oracle.com/us/products/039247.htm. Accessed 06 Aug 2015
Amazon CloudSearch. https://aws.amazon.com/cloudsearch/. Accessed 06 Aug 2015
SAS. http://www.sas.com/en_us/software/business-intelligence.html. Accessed 06 Aug 2015
IBM SPSS Software. http://www-01.ibm.com/software/analytics/spss/. Accessed 06 Aug 2015
Microstrategy. http://www.microstrategy.com/us/. Accessed 06 Aug 2015
SAP Business Intelligence Solutions. http://go.sap.com/solution/platform-technology/business-intelligence.html. Accessed 06 Aug 2015
IBM Cognos Software. http://www-01.ibm.com/software/analytics/cognos/. Accessed 06 Aug 2015
Tableau Software. http://www.tableau.com/. Accessed 06 Aug 2015
JasperSoft Business Intelligence Software. https://www.jaspersoft.com/. Accessed 06 Aug 2015
Pentaho. http://www.pentaho.com/. Accessed 06 Aug 2015.

Download references

Author information

Authors and Affiliations

IBM Analytics, San Francisco, CA, USA
Sourav Mazumder

Authors

Sourav Mazumder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sourav Mazumder .

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Burwood, Victoria, Australia
Shui Yu
Schl of Comp Science & Engg, The Univ of Aizu, Aizu-Wakamatsu City, Japan
Song Guo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mazumder, S. (2016). Big Data Tools and Platforms. In: Yu, S., Guo, S. (eds) Big Data Concepts, Theories, and Applications . Springer, Cham. https://doi.org/10.1007/978-3-319-27763-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-27763-9_2
Published: 04 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27761-5
Online ISBN: 978-3-319-27763-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics