Skip to main content

Big Data Tools and Platforms

  • Chapter
  • First Online:
Big Data Concepts, Theories, and Applications

Abstract

The fast evolving Big Data Tools and Platforms space has given rise to various technologies to deal with different Big Data use cases. However, because of the multitude of the tools and platforms involved it is often difficult for the Big Data practitioners to understand and select the right tools for addressing a given business problem related to Big Data. In this chapter we cover an introductory discussion to the various Big Data Tools and Platforms with the aim of providing necessary breadth and depth to the Big Data practitioner so that they can have a reasonable background to start with to support the Big Data initiatives in their organizations. We start with the discussion of common Technical Concepts and Patterns typically used by the core Big Data Tools and Platforms. Then we delve into the individual characteristics of different categories of the Big Data Tools and Platforms in detail. Then we also cover the applicability of the various categories of Big Data Tools and Platforms to various enterprise level Big Data use cases. Finally, we discuss the future works happening in this space to cover the newer patterns, tools and platforms to be watched for implementation of Big Data use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Software Foundation. http://en.wikipedia.org/wiki/Apache_Software_Foundation. Accessed 06 Aug 2015

  2. Apache Projects Directory. https://projects.apache.org/. Accessed 06 Aug 2015

  3. Apache Incubator. http://incubator.apache.org/. Accessed 06 Aug 2015

  4. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Sixth symposium on operating system design and implementation, San Francisco, CA, December 2004

    Google Scholar 

  5. Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: SOSP’03 Proceedings of the nineteenth ACM symposium on operating systems principles, pp 29–43, October 19–22, 2003, Bolton Landing, New York, USA

    Google Scholar 

  6. Woodie A (2014) Yahoo: we run the whole company on Hadoop. In: Datanami, http://www.datanami.com/2014/06/04/yahoo-run-whole-company-hadoop/. Accessed 06 Aug 2015

  7. HDFS Users Guide. https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html. Accessed 06 Aug 2015

  8. Hadoop Map Reduce. http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html. Accessed 06 Aug 2015

  9. Saha B (2013) Philosophy behind YARN Resource Management. http://hortonworks.com/blog/philosophy-behind-yarn-resource-management/

  10. Murthy A (2012) Apache Hadoop YARN – concepts and applications. http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/. Accessed 06 Aug 2015

  11. Apache Spark. https://spark.apache.org/. Accessed 06 Aug 2015

  12. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. University of California, Berkeley, CA

    Google Scholar 

  13. Dean J, Ghemawat S (2004) Parallel execution. In: MapReduce: simplified data processing on large clusters. http://research.google.com/archive/mapreduce-osdi04-slides/index-auto-0008.html. Accessed 06 Aug 2015

  14. Apache Storm. https://storm.apache.org/. Accessed 06 Aug 2015 https://storm.apache.org/documentation/Concepts.html

  15. Apache Drill. http://drill.apache.org/. Accessed 06 Aug 2015

  16. Apache Drill Architecture. http://drill.apache.org/architecture/. Accessed 06 Aug 2015

  17. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. University of California, Berkeley, CA

    Google Scholar 

  18. Apache Drill Architecture. http://drill.apache.org/architecture/

  19. Tez. http://tez.apache.org/. Accessed 06 Aug 2015

  20. HDFS Architecture. In: HDFS Architecture Guide. http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 06 Aug 2015

  21. Apache Hive. https://cwiki.apache.org/confluence/display/Hive/Home. Accessed 06 Aug 2015

  22. Gates A, Bains R (2014) Stinger.next: Enterprise SQL at Hadoop Scale with Apache Hive. http://hortonworks.com/blog/stinger-next-enterprise-sql-hadoop-scale-apache-hive/. Accessed 06 Aug 2015

  23. Zhan X, Ho S (2015). Hive on Spark. https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started. Accessed 22 Jan 2016

  24. Binary JSON. http://bsonspec.org/. Accessed 06 Aug 2015

  25. Revolution R. http://www.revolutionanalytics.com/#homepage-section-1155. Accessed 06 Aug 2015

  26. Big R. http://www-01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/t_overview_bigr.html. Accessed 06 Aug 2015

  27. YARN Architecture. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html. Accessed 06 Aug 2015

  28. SerDe. https://cwiki.apache.org/confluence/display/Hive/SerDe. Accessed 06 Aug 2015

  29. LanguageManual UDF. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF. Accessed 06 Aug 2015

  30. HiveServer2. https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-HiveServer2. Accessed 06 Aug 2015

  31. Apache Pig. https://pig.apache.org/. Accessed 06 Aug 2015

  32. Pig Latin Basics. http://pig.apache.org/docs/r0.14.0/basic.html. Accessed 06 Aug 2015

  33. Pig on Spark. https://cwiki.apache.org/confluence/display/PIG/Pig+on+Spark. Accessed 22 Jan 2016

  34. Apache HCatalog. https://cwiki.apache.org/confluence/display/Hive/HCatalog. Accessed 06 Aug 2015

  35. Apache WebHCat. https://cwiki.apache.org/confluence/display/Hive/WebHCat. Accessed 06 Aug 2015

  36. Apache Flume. https://flume.apache.org/. Accessed 06 Aug 2015

  37. MongoDB Sharding. http://docs.mongodb.org/manual/core/sharding-introduction/. Accessed 06 Aug 2015

  38. Apache Sqoop. http://sqoop.apache.org/. Accessed 06 Aug 2015

  39. X.509. In: Wikipedia. https://en.wikipedia.org/wiki/X.509. Accessed 06 Aug 2015

  40. Apache Oozie. http://oozie.apache.org/. Accessed 06 Aug 2015

  41. XPDL. In: Wikipedia. http://en.wikipedia.org/wiki/XPDL. Accessed 06 Aug 2015

  42. Vormetric Data Security Platform. http://www.vormetric.com/. Accessed 06 Aug 2015

  43. Apache ZooKeeper. https://cwiki.apache.org/confluence/display/ZOOKEEPER/Index. Accessed 06 Aug 2015

  44. Linux Unified Key Setup. In: Wikipedia. https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup. Accessed 06 Aug 2015

  45. Apache Slider. http://slider.incubator.apache.org/. Accessed 06 Aug 2015

  46. Apache Knox. https://knox.apache.org/. Accessed 06 Aug 2015

  47. Apache Ambari. https://ambari.apache.org/. Accessed 06 Aug 2015

  48. Apache Giraph. http://giraph.apache.org/. Accessed 06 Aug 2015

  49. Malewicz G, Austern MH, Bik AJC, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010). Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data. http://dl.acm.org/citation.cfm?id=1807184

  50. Valiant LG (1990). A bridging model for parallel computation. Commun ACM 33(8):103–111

    Google Scholar 

  51. IBM Infosphere Guardium Data Encryption. http://www-03.ibm.com/software/products/en/infosphere-guardium-data-encryption. Accessed 06 Aug 2015

  52. MongoDB. http://www.mongodb.com/. Accessed 06 Aug 2015

  53. BitLocker Drive Encryption. http://windows.microsoft.com/en-us/windows-vista/bitlocker-drive-encryption-overview/. Accessed 06 Aug 2015

  54. Apache Cassandra. http://cassandra.apache.org/. Accessed 06 Aug 2015

  55. Apache Hbase. http://hbase.apache.org/. Accessed 06 Aug 2015

  56. Britton Lee, Inc. In: Wikipedia. https://en.wikipedia.org/wiki/Britton_Lee,_Inc. Accessed 06 Aug 2015

  57. Snijders C, Matzat U, Reips U (2012) Big data: big gaps of knowledge in the field of internet science. Int J Internet Sci 7(1):1–5

    Google Scholar 

  58. NoSQL. http://nosql-database.org/

  59. Cockroach Labs. http://cockroachdb.org/. Accessed 06 Aug 2015

  60. Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman J, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh W, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D (2012) Spanner: Google’s globally-distributed database. In: Tenth symposium on operating system design and implementation, Hollywood, CA, October 2012

    Google Scholar 

  61. IBM Cloudant. https://cloudant.com/. Accessed 06 Aug 2015

  62. Apache Nutch. http://nutch.apache.org/. Accessed 06 Aug 2015

  63. Apache Parquet. http://parquet.apache.org/. Accessed 06 Aug 2015

  64. Leverenz L (2015). Language Manual of ORC. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC. Accessed 06 Aug 2015

  65. Apache Avro. http://avro.apache.org/docs/1.3.0/. Accessed 06 Aug 2015

  66. Sequence File. http://wiki.apache.org/hadoop/SequenceFile. Accessed 06 Aug 2015

  67. Melnik S, Gubarev A, Long JJ, Romer G, Shivakumar S, Tolton M, Vassilakis T (2010) Dremel: interactive analysis of web-scale datasets. In: Proceedings of the 36th international conference on very large data bases, 330–339, September 13–17, 2010, Singapore.

    Google Scholar 

  68. Massively Parallel. In: Wikipedia. http://en.wikipedia.org/wiki/Massively_parallel_%28computing%29. Accessed 06 Aug 2015

  69. Apache Sentry. http://sentry.incubator.apache.org/. Accessed 06 Aug 2015

  70. Apache Ranger. http://ranger.incubator.apache.org/. Accessed 06 Aug 2015

  71. Apache Falcon. http://falcon.apache.org/index.html. Accessed 06 Aug 2015

  72. Apache Atlas Proposal. https://wiki.apache.org/incubator/AtlasProposal. Accessed 06 Aug 2015

  73. ODPi. https://www.odpi.org/. Accessed 22 Jan 2016

  74. Amazon EMR. http://aws.amazon.com/elasticmapreduce/. Accessed 06 Aug 2015

  75. IBM’s BigInsight for Apache Hadoop on Bluemix. https://console.ng.bluemix.net/?ace_base=true/#/store/cloudOEPaneId=store&serviceOfferingGuid=aff58576-c0fc-4d9a-a57d-c6dd492bede1&fromCatalog=true. Accessed 06 Aug 2015

  76. Qubole’s Hadoop As A Service. http://www.qubole.com/hadoop-as-a-service/. Accessed 06 Aug 2015

  77. HDInsight on Microsoft Azure. http://azure.microsoft.com/en-us/services/hdinsight. Accessed 06 Aug 2015

  78. Big Data Computing in the HP Cloud. http://www.hpcloud.com/solutions/hadoop. Accessed 06 Aug 2015

  79. Hadoop on Google Compute Engine. https://cloud.google.com/solutions/hadoop/. Accessed 06 Aug 2015

  80. Altiscale Hadoop As A Service. https://www.altiscale.com/. Accessed 06 Aug 2015

  81. Oracle Big Data Appliance. https://www.oracle.com/engineered-systems/big-data-appliance/index.html. Accessed 06 Aug 2015

  82. Avnet Hadoop Appliance. http://news.avnet.com/index.php?s=20295&item=127070. Accessed 06 Aug 2015

  83. Microsoft Analytics Platform System. http://www.microsoft.com/en-us/server-cloud/products/analytics-platform-system/Overview.aspx. Accessed 06 Aug 2015

  84. EMC Data Computing Appliance. http://pivotal.io/big-data/emc-dca. Accessed 06 Aug 2015

  85. SeaMicro Fabric Compute System. http://www.seamicro.com/sites/default/files/SM_DS06_v2.1.pdf. Accessed 06 Aug 2015

  86. SGI InfiniteData Cluster. https://www.sgi.com/products/servers/infinitedata_cluster/. Accessed 06 Aug 2015

  87. Cray Cluster Supercomputer for Hadoop. http://www.cray.com/Assets/PDF/products/cs/CS300HadoopBrochure.pdf. Accessed 06 Aug 2015

  88. Cisco Unified Computing System. http://www.cisco.com/c/dam/en/us/solutions/collateral/data-center-virtualization/unified-computing/at_a_glance_c45-523181.pdf. Accessed 06 Aug 2015

  89. Apache Flink. https://flink.apache.org/. Accessed 06 Aug 2015

  90. Apache Kafka. http://kafka.apache.org/documentation.html#introduction. Accessed 06 Aug 2015

  91. Apache Solr. http://lucene.apache.org/solr/. Accessed 06 Aug 2015

  92. Apache Lucene. https://lucene.apache.org/. Accessed 06 Aug 2015

  93. Elastic Search. https://www.elastic.co/products/elasticsearch. Accessed 06 Aug 2015

  94. Sphynx. http://sphinxsearch.com/. Accessed 06 Aug 2015

  95. Found: Elasticsearch As A Service. https://www.found.no/. Accessed 06 Aug 2015

  96. Snaplogic. http://www.snaplogic.com/. Accessed 06 Aug 2015

  97. Apache Mahout. http://mahout.apache.org/. Accessed 06 Aug 2015

  98. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. J ACM Trans Comput Syst (TOCS) 26(2)

    Google Scholar 

  99. Microsoft Azure Stream Analytics. http://azure.microsoft.com/en-us/services/stream-analytics/. Accessed 06 Aug 2015

  100. IBM Geospatial Analytics. https://console.ng.bluemix.net/?ace_base=true/#/store/cloudOEPaneId=store&fromCatalog=true&serviceOfferingGuid=f5c45150-8023-4d3e-a3d4-5a3a8ca8e407. Accessed 06 Aug 2015

  101. Amazon Kinesis. http://aws.amazon.com/kinesis/. Accessed 06 Aug 2015

  102. Natural Language Generation. In: Wikipedia. http://en.wikipedia.org/wiki/Natural_language_generation

  103. Quill. http://www.narrativescience.com/quill. Accessed 06 Aug 2015

  104. Wordsmith. http://automatedinsights.com/wordsmith/. Accessed 06 Aug 2015

  105. The Arria NLG Engine. http://www.arria.com/platform.php. Accessed 06 Aug 2015

  106. George L (2009) HBase Architecture 101 – Storage. http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. Accessed 06 Aug 2015

  107. IBM Platform Computing. http://www-03.ibm.com/systems/platformcomputing/products/symphony/. Accessed 06 Aug 2015

  108. Cascading. http://www.cascading.org/. Accessed 06 Aug 2015

  109. Herman Hollerith. Columbia University, Computing History. http://www.columbia.edu/cu/computinghistory/hollerith.html. Accessed 06 Aug 2015

  110. Internet of Things. In: Wikipedia. http://en.wikipedia.org/wiki/Internet_of_Things

  111. Reed J (2015) Hadoop survey offers insight into investment, adoption. DataInformed. http://data-informed.com/hadoop-survey-offers-insight-into-investment-adoption/

  112. Informatica Cloud Edition. https://www.informatica.com/products/cloud-integration/editions-and-pricing/us-pricing.html#fbid=SnXuN8qeDld. Accessed 06 Aug 2015

  113. DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. In: SOSP’07. http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

  114. Simple Authentication and Security Layer. In: Wikipedia. https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer. Accessed 06 Aug 2015

  115. Aerospike. http://www.aerospike.com/. Accessed 06 Aug 2015

  116. Hazelcast. http://hazelcast.org/. Accessed 06 Aug 2015

  117. Pivotal GemFire. http://pivotal.io/big-data/pivotal-gemfire. Accessed 06 Aug 2015

  118. Amazon Dynamo DB. https://aws.amazon.com/dynamodb/. Accessed 06 Aug 2015

  119. ObjectRocket. http://objectrocket.com/. Accessed 06 Aug 2015/

  120. Apache Ignite. https://ignite.incubator.apache.org/. Accessed 06 Aug 2015

  121. Oracle Coherence. http://www.oracle.com/technetwork/middleware/coherence/overview/index-087514.html. Accessed 06 Aug 2015

  122. Oracle TimesTen In-Memory Database. http://www.oracle.com/us/products/database/timesten/overview/index.html. Accessed 06 Aug 2015

  123. IBM DB2 with BLU Acceleration. http://www.ibmbluhub.com/. Accessed 06 Aug 2015

  124. SAP HANA. http://hana.sap.com/abouthana.html. Accessed 06 Aug 2015

  125. EXASOL. http://www.exasol.com/en/products/exasolution/. Accessed 06 Aug 2015

  126. SAP HANA Cloud Platform. http://hcp.sap.com/index.html. Accessed 06 Aug 2015

  127. IBM DashDB Cloud Data Warehouse Service. http://www-01.ibm.com/software/data/dashdb/. Accessed 06 Aug 2015

  128. EXACloud. http://www.exasol.com/en/products/exacloud/. Accessed 06 Aug 2015

  129. Google Cloud Bigtable. https://cloud.google.com/bigtable/docs/. Accessed 06 Aug 2015

  130. Spark Streaming. https://spark.apache.org/streaming/. Accessed 06 Aug 2015

  131. Teradata. http://www.teradata.com/?LangType=1033l. Accessed 06 Aug 2015

  132. Apache Phoenix. http://phoenix.apache.org/. Accessed 06 Aug 2015

  133. Mazumder S, Dhar S (2015) Hadoop_as_Big_Data_Operating_System__The_Emerging_ Approach_for_Managing_Challenges_of_Enterprise_Big_Data_Platform. Research Gate. http://www.researchgate.net/publication/274713261

  134. Sort Benchmark Home Page. http://sortbenchmark.org/. Accessed 06 Aug 2015

  135. Harris D (2014) Databricks demolishes big data benchmark to prove Spark is fast on disk, too. In: GIGAOM Research. https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/. Accessed 06 Aug 2015

  136. Gualtieri M, Yuhanna N, Kisker H, Murphy D (2014) The Forrester Waveâ„¢: Big Data Hadoop Solutions, Q1 2014. http://www.forrester.com/The+Forrester+Wave+Big+Data+Hadoop+Solutions+Q1+2014/fulltext/-/E-RES112461

  137. Penn B (2014) Comparing MapR-FS and HDFS NFS and Snapshots. https://www.mapr.com/blog/comparing-mapr-fs-and-hdfs-nfs-and-snapshots#.VWxfXWMSknE. Accessed 06 Aug 2015

  138. IBM Spectrum Scale V4.1.1 delivers software-defined storage for cloud, big data and analytics, and data-intensive technical workflows. http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=897&letternum=ENUS215-148. Accessed 06 Aug 2015

  139. IBM Netezza. http://www-01.ibm.com/software/data/netezza/. Accessed 06 Aug 2015

  140. Gorda B (2014) Intel® Enterprise Edition for Lustre* Software: Simpler, Smarter. http://www.intel.com/content/www/us/en/software/isc-2014-intel-enterprise-edition-lustre-software-video.html. http://www.emc.com/collateral/hardware/solution-overview/h8319-scale-out-nas-greenplum-hd-so.pdf. Accessed 06 Aug 2015

  141. Hadoop On EMC ISILON Scale-Out NAS. http://www.emc.com/collateral/software/white-papers/h10528-wp-hadoop-on-isilon.pdf. Accessed 06 Aug 2015

  142. Apache Tachyon. http://tachyon-project.org/. Accessed 06 Aug 2015

  143. FoundationDB. https://foundationdb.com/. Accessed 06 Aug 2015

  144. ACID. In: Wikipedia. http://en.wikipedia.org/wiki/ACID. Accessed 06 Aug 2015

  145. Talend. https://www.talend.com/. Accessed 06 Aug 2015

  146. Pentaho. http://wiki.pentaho.com/display/COM/Community+Edition+Downloads. Accessed 06 Aug 2015

  147. Lu H, Kian-Lee T (1992) Dynamic and load-balanced task- oriented database query processing in parallel systems. In: Proceedings of the 3rd international conference on extending database technology, 357–372

    Google Scholar 

  148. Infosphere Streams. http://www-03.ibm.com/software/products/en/infosphere-streams. Accessed 06 Aug 2015

  149. TIBCO StreamBase. http://www.tibco.com/products/event-processing/complex-event-processing/streambase-complex-event-processing. Accessed 06 Aug 2015

  150. Software AG APAMA. http://www.softwareag.com/corporate/products/apama_webmethods/analytics/overview/default.asp. Accessed 06 Aug 2015

  151. Sqlstream. http://www.sqlstream.com/. Accessed 06 Aug 2015

  152. Cisco Tidal. http://www.cisco.com/c/en/us/products/cloud-systems-management/tidal-enterprise-scheduler/index.html. Accessed 06 Aug 2015

  153. Comer D (1979) Ubiquitous B-Tree. ACM Comput Surv (CSUR) Surv 11(2):121–137

    Article  MATH  Google Scholar 

  154. Manning CD, Raghavan P, Schütze H (2008) A first take at building an inverted index. In: Introduction to information retrieval, Cambridge University Press, New York, USA

    Google Scholar 

  155. Binari Radix Indexes. In: Wikipedia. https://en.wikipedia.org/wiki/Radix_tree. Accessed 06 Aug 2015

  156. O’Neil E, O’Neil P, Wu K (2007) Bitmap index design choices and their performance implications. In: IDEAS’07 proceedings of the 11th international database engineering and applications symposium, 72–84

    Google Scholar 

  157. Broder A, Mitzenmacher M (2005) Network applications of bloom filters: A survey. Internet Math 1(4):485–509

    Article  MathSciNet  MATH  Google Scholar 

  158. Oracle Exadata. https://www.oracle.com/engineered-systems/exadata/index.html. Accessed 06 Aug 2015

  159. Design a pluggable interface to place replicas of blocks in HDFS. https://issues.apache.org/jira/browse/HDFS-385. Accessed 06 Aug 2015

  160. Zero loss HDFS data replication for multiple datacenters. https://issues.apache.org/jira/browse/HDFS-5442. Accessed 06 Aug 2015

  161. DistCp Version2 Guide. http://hadoop.apache.org/docs/r2.7.1/hadoop-distcp/DistCp.html. Accessed 06 Aug 2015

  162. Dittrich J, Richter S, Schuh S (2013) Efficient OR Hadoop: why not both? Datenbank-Spektrum 13(1):17–22

    Article  Google Scholar 

  163. Gankidi VR, Teletia N, Patel JM, Halverson A, DeWitt DJ (2014) Indexing HDFS Data in PDW: splitting the data from the index. Proc VLDB Endow 7(13)

    Google Scholar 

  164. Liao H, Han J, Fang J (2010) Multi-dimensional Index on Hadoop distributed file system. In: Fifth IEEE international conference on networking, architecture, and storage

    Google Scholar 

  165. Amplab. https://amplab.cs.berkeley.edu/. Accessed 06 Aug 2015

  166. Apache Mesos. http://mesos.apache.org/. Accessed 06 Aug 2015

  167. IBM Information Server Suite. https://www-01.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.productization.iisinfsv.overview.doc/topics/cisoproductsinthesuite.html. Accessed 06 Aug 2015

  168. Informatica. https://www.informatica.com/. Accessed 06 Aug 2015

  169. Resource Management for MongoDB. http://jsonstudio.com/resource-management-for-mongodb/. Accessed 06 Aug 2015

  170. Demo: Migrating MongoDB data with Mesos and Flocker. https://mesosphere.com/blog/2015/05/21/demo-migrating-mongodb-data-with-mesos-and-powerstrip/. Accessed 06 Aug 2015

  171. Nachbar E (2014) Cassandra on Mesos – Scalable Enterprise Storage. https://mesosphere.com/blog/2014/02/12/cassandra-on-mesos-scalable-enterprise-storage/. Accessed 06 Aug 2015

  172. Kamenov DZ (2012) Monitoring HBase. http://www.monitis.com/blog/2012/03/28/monitoring-hbase/. Accessed 06 Aug 2015

  173. Hannibal Wiki. https://github.com/sentric/hannibal/wiki. Accessed 06 Aug 2015

  174. Lai M, Koontz E, Purtell A (2012) Coprocessor Introduction. https://blogs.apache.org/hbase/entry/coprocessor_introduction. Accessed 06 Aug 2015

  175. Krompass S, Dayal U, Kuno HA, Kemper A (2007) Dynamic workload management for very large data warehouses: juggling feathers and bowling balls. In: Proceedings of the 33rd international conference on very large data bases, 1105–1115

    Google Scholar 

  176. Krompass S, Kuno HA, Wiener JL, Wilkinson K, Dayal U, Kemper A (2009) Managing long-running queries. In: Proceedings of the 13th international conference on extending database technology, 132–143

    Google Scholar 

  177. Pang H, Carey MJ, Livny M (1995) Multiclass query scheduling in real-time database systems. IEEE Trans Knowl Data Eng 7(4):533–551

    Article  Google Scholar 

  178. Brown KP, Mehta M, Carey MJ, Livny M (1994) Towards automated performance tuning for complex workloads. In: Proceedings of the 20th international conference on very large data bases, 72–84

    Google Scholar 

  179. Chaudhuri S, König AC, Narasayya VR (2004) SQLCM: A continuous monitoring framework for relational database engines. In: Proceedings of the 20th IEEE international conference on data engineering, 473–484

    Google Scholar 

  180. Potter T (2014) Solr on YARN. In: Lucidworks. https://lucidworks.com/blog/solr-yarn/. Accessed 06 Aug 2015

  181. SPM – Performance Monitoring & Alerting. http://sematext.com/spm/. Accessed 06 Aug 2015

  182. Elasticsearch on YARN. https://www.elastic.co/guide/en/elasticsearch/hadoop/current/es-yarn.html. Accessed 06 Aug 2015

  183. Elasticsearch on Mesos. https://github.com/mesos/elasticsearch. Accessed 06 Aug 2015

  184. Health and Performance Monitoring. https://www.elastic.co/guide/en/elasticsearch/client/community/current/health.html. Accessed 06 Aug 2015

  185. Shield | Security for Elasticsearch. https://www.elastic.co/products/shield. Accessed 06 Aug 2015

  186. High availability – Built-in Mirroring. http://sphinxsearch.com/blog/2013/04/01/high-availability-built-in-mirroring/. Accessed 06 Aug 2015

  187. Sphinx Tools beta. https://tools.sphinxsearch.com/. Accessed 06 Aug 2015

  188. Security Onion. http://blog.securityonion.net/2015/05/sphinxsearch-219.html. Accessed 06 Aug 2015

  189. Monitoring Tachyon. https://tachyon.atlassian.net/browse/TACHYON-84

  190. Li H, Ghodsi A, Zaharia M, Shenker S, Stoica I (2014) Tachyon: reliable, memory speed storage for cluster computing frameworks. In: SoCC’14, Seattle WA, 3–5 Nov 2014

    Google Scholar 

  191. Pivotal Greenplum. http://pivotal.io/big-data/pivotal-greenplum-database. Accessed 06 Aug 2015

  192. Klpoo R (2014) Netezza Zone Maps and I/O Avoidance. In: Database Fog Blog. http://skylandtech.net/2014/04/25/netezza-zone-maps-and-io-avoidance/

  193. Centralized cache management in HDFS. https://issues.apache.org/jira/browse/HDFS-4949. Accessed 06 Aug 2015

  194. Support memory as a storage medium. https://issues.apache.org/jira/browse/HDFS-5851. Accessed 06 Aug 2015

  195. HDFS Federation. http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/Federation.html. Accessed 06 Aug 2015

  196. Sung M (2000) SIMD parallel processing. In: 6.911 Architecture Anonymous

    Google Scholar 

  197. R. https://www.r-project.org/about.html. Accessed 06 Aug 2015

  198. RHadoop. https://github.com/RevolutionAnalytics/RHadoop/wiki. Accessed 06 Aug 2015

  199. Spark DataFrames. http://spark.apache.org/docs/latest/sql-programming-guide.html#dataframes. Accessed 06 Aug 2015

  200. BDAS, the Berkeley Data Analytics Stack: https://amplab.cs.berkeley.edu/software/. Accessed 06 Aug 2015

  201. Succinct. http://succinct.cs.berkeley.edu/wp/wordpress/. Accessed 06 Aug 2015

  202. Splash. http://zhangyuc.github.io/splash/. Accessed 06 Aug 2015

  203. Spark ML Programming Guide. https://spark.apache.org/docs/latest/ml-guide.html#main-concepts. Accessed 06 Aug 2015

  204. BlinkDB. http://blinkdb.org/. Accessed 06 Aug 2015

  205. Sampleclean. http://sampleclean.org/. Accessed 06 Aug 2015

  206. Crankshaw D (2014) Velox: models in action. https://amplab.cs.berkeley.edu/projects/velox/. Accessed 06 Aug 2015

  207. Databricks Spark As A Service. https://databricks.com/product/databricks. Accessed 06 Aug 2015

  208. Qubole Spark As A Service. http://www.qubole.com/apache-spark-as-a-service/. Accessed 06 Aug 2015

  209. IBM Spark As A Service. http://www.spark.tc/beta/. Accessed 06 Aug 2015

  210. Terracota BigMemory. http://terracotta.org/products/bigmemory. Accessed 06 Aug 2015

  211. Apache Thrift. https://thrift.apache.org/. Accessed 06 Aug 2015

  212. Rodriguez A (2008) RESTful Web services: the basics. In: IBM developerWorks. http://www.ibm.com/developerworks/library/ws-restful/

  213. Jupyter. http://jupyter.org/. Accessed 06 Aug 2015

  214. Apache Zeppelin. https://zeppelin.incubator.apache.org/. Accessed 06 Aug 2015

  215. Ullman JD, Aho A (1992) The relational data model. In: Foundations of Computer Science, C edn. http://infolab.stanford.edu/~ullman/focs/ch08.pdf

  216. Edgar F. Codd. In: Wikipedia. https://en.wikipedia.org/wiki/Edgar_F_Codd. Accessed 06 Aug 2015

  217. Ullman JD, Aho A (1992) The graph data model. In: Foundations of computer science, C edn. http://infolab.stanford.edu/~ullman/focs/ch09.pdf

  218. Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance techniques. Newslett ACM SIGMOD 34(3):31–36

    Article  Google Scholar 

  219. Gilbert S, Lynch NA (2012) Perspectives on the CAP theorem. Computer 45(2):30–36

    Article  Google Scholar 

  220. Kamat G, Singh S (2013). Comparisons of compression. In: Hadoop Summit 2013. http://www.slideshare.net/Hadoop_Summit/kamat-singh-june27425pmroom210cv2

  221. DMZ (Demilitarized Zone). In: CCM. http://ccm.net/contents/602-dmz-demilitarized-zone. Accessed 06 Aug 2015

  222. Stonebraker M. The case for shared nothing. University of California, Berkeley, CA

    Google Scholar 

  223. Chamberlin DD, Boyce RF (1974) SEQUEL: A structured english query language. In: Proceedings of the 1974 ACM SIGFIDET workshop on Data description, access and control, 249–264

    Google Scholar 

  224. Kerberos: The Network Authentication Protocol. http://web.mit.edu/kerberos/. Accessed 06 Aug 2015

  225. Lightweight Directory Access Protocol (LDAP). In: Wikipedia. https://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol. Accessed 06 Aug 2015

  226. Machine Learning. In: Wikipedia. https://en.wikipedia.org/wiki/Machine_learning. Accessed 06 Aug 2015

  227. Apache System ML. http://systemml.apache.org/. Accessed 19 Jan 2016

  228. Mazumder S (2010) NoSQL in the Enterprise. In: InfoQ. http://www.infoq.com/articles/nosql-in-the-enterprise. Accessed 06 Aug 2015

  229. Zaman Khan RZ, Ali J (2013) Use of DAG in distributed parallel computing. Int J Appl Innov Eng Manag 2(11):81–85

    Google Scholar 

  230. Analyzing and manipulating big data with Big SQL. http://www-01.ibm.com/support/knowledgecenter/SSPT3X_4.0.0/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/bigsql_analyzingbigdata.html. Accessed 06 Aug 2015

  231. Cloudera Impala. http://www.cloudera.com/content/cloudera/en/products-and-services/cdh/impala.html. Accessed 06 Aug 2015

  232. Introduction to Massively Parallel Processing (MPP) database. https://dwarehouse.wordpress.com/2012/12/28/introduction-to-massively-parallel-processing-mpp-database/. Accessed 06 Aug 2015

  233. Flume User Guide. https://flume.apache.org/FlumeUserGuide.html. Accessed 06 Aug 2015

  234. Noll MG (2013) Running a Multi-Node Storm Cluster. http://www.michael-noll.com/tutorials/running-multi-node-storm-cluster/. Accessed 06 Aug 2015

  235. Sharma A (2014) Apache Kafka: Next Generation Distributed Messaging System. http://www.infoq.com/articles/apache-kafka. Accessed 06 Aug 2015

  236. Apache Solr Architecture. https://www.safaribooksonline.com/library/view/scaling-big-data/9781783281374/ch02s02.html. Accessed 06 Aug 2015

  237. Spark Cluster Overview. https://spark.apache.org/docs/1.0.1/cluster-overview.html. Accessed 06 Aug 2015

  238. Mesos Architecture. http://mesos.apache.org/documentation/latest/mesos-architecture/. Accessed 06 Aug 2015

  239. Amazon Redshift. https://aws.amazon.com/redshift/. Accessed 06 Aug 2015

  240. Teradata Cloud. http://www.teradata.com/cloud-overview/?LangType=1033&LangSelect=true. Accessed 06 Aug 2015

  241. IBM Watson Explorer. http://www.ibm.com/smarterplanet/us/en/ibmwatson/explorer.html. Accessed 06 Aug 2015

  242. Oracle Secure Enterprise Search. http://www.oracle.com/us/products/039247.htm. Accessed 06 Aug 2015

  243. Amazon CloudSearch. https://aws.amazon.com/cloudsearch/. Accessed 06 Aug 2015

  244. SAS. http://www.sas.com/en_us/software/business-intelligence.html. Accessed 06 Aug 2015

  245. IBM SPSS Software. http://www-01.ibm.com/software/analytics/spss/. Accessed 06 Aug 2015

  246. Microstrategy. http://www.microstrategy.com/us/. Accessed 06 Aug 2015

  247. SAP Business Intelligence Solutions. http://go.sap.com/solution/platform-technology/business-intelligence.html. Accessed 06 Aug 2015

  248. IBM Cognos Software. http://www-01.ibm.com/software/analytics/cognos/. Accessed 06 Aug 2015

  249. Tableau Software. http://www.tableau.com/. Accessed 06 Aug 2015

  250. JasperSoft Business Intelligence Software. https://www.jaspersoft.com/. Accessed 06 Aug 2015

  251. Pentaho. http://www.pentaho.com/. Accessed 06 Aug 2015.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sourav Mazumder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Mazumder, S. (2016). Big Data Tools and Platforms. In: Yu, S., Guo, S. (eds) Big Data Concepts, Theories, and Applications . Springer, Cham. https://doi.org/10.1007/978-3-319-27763-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27763-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27761-5

  • Online ISBN: 978-3-319-27763-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics