Overview
SparkBench is a flexible framework for benchmarking, simulating, comparing, and testing versions of Apache Spark and Spark applications. It provides users three levels of parallelism and a variety of built-in data generators and workloads that allow users to finely tune their setup and get the benchmarking results they need.
Definitions
A framework for benchmarking Apache Spark.
Historical Background
Apache Spark began in 2010 as a research project by Matei Zaharia and others in the Berkeley AMPLab. Following the landmark success of Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing by Zaharia et al. (2012), Spark continued to gain popularity and usage as its performance gains over traditional MapReduce workflows became evident. Spark continued to grow as well, introducing Python and R APIs, machine learning, graph computation, SQL, and streaming computation.
In 2015,...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
AMPLab Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark. Accessed 23 Feb 2018
Apache Airflow. http://airbnb.io/projects/airflow/. Accessed 23 Feb 2018
Apache Spark. https://spark.apache.org/. Accessed 23 Feb 2018
Apache Zeppelin. https://zeppelin.apache.org/. Accessed 23 Feb 2018
Azkaban. https://azkaban.github.io/. Accessed 23 Feb 2018
HOCON (Human-Optimized Config Object Notation). https://github.com/lightbend/config/blob/master/HOCON.md. Accessed 23 Feb 2018
IBM Spark-Tacing. https://github.com/CODAI/spark-tracing. Accessed 23 Feb 2018
Intel HiBench Suite. https://github.com/intel-hadoop/HiBench. Accessed 23 Feb 2018
Li M et al (2015) SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark. https://research.spec.org/fileadmin/user_upload/documents/wg_bd/BD-20150401-spark_benchmark-v1.3-spec.pdf. Accessed 23 Feb 2018
Project Jupyter. http://jupyter.org/. Accessed 23 Feb 2018
TPC Decision Support Benchmark. http://www.tpc.org/tpcds/default.asp. Accessed 23 Feb 2018
YourKit Java Profiler. https://www.yourkit.com/java/profiler/features/. Accessed 23 Feb 2018
Zaharia M et al (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf. Accessed 23 Feb 2018
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this entry
Cite this entry
Poelman, J., Curtin, E.M. (2019). SparkBench. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_300
Download citation
DOI: https://doi.org/10.1007/978-3-319-77525-8_300
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering