SparkBench

Poelman, John; Curtin, Emily May

doi:10.1007/978-3-319-77525-8_300

SparkBench

John Poelman³ &
Emily May Curtin³

Reference work entry
First Online: 01 January 2019

38 Accesses

Synonyms

Apache Spark benchmarking; Spark-Bench; CODAIT/spark-bench

Overview

SparkBench is a flexible framework for benchmarking, simulating, comparing, and testing versions of Apache Spark and Spark applications. It provides users three levels of parallelism and a variety of built-in data generators and workloads that allow users to finely tune their setup and get the benchmarking results they need.

Definitions

A framework for benchmarking Apache Spark.

Historical Background

Apache Spark began in 2010 as a research project by Matei Zaharia and others in the Berkeley AMPLab. Following the landmark success of Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing by Zaharia et al. (2012), Spark continued to gain popularity and usage as its performance gains over traditional MapReduce workflows became evident. Spark continued to grow as well, introducing Python and R APIs, machine learning, graph computation, SQL, and streaming computation.

In 2015,...

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 849.99; Price excludes VAT (USA)

Hardcover Book: USD 999.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

AMPLab Big Data Benchmark. https://amplab.cs.berkeley.edu/benchmark. Accessed 23 Feb 2018
Apache Airflow. http://airbnb.io/projects/airflow/. Accessed 23 Feb 2018
Apache Spark. https://spark.apache.org/. Accessed 23 Feb 2018
Apache Zeppelin. https://zeppelin.apache.org/. Accessed 23 Feb 2018
Azkaban. https://azkaban.github.io/. Accessed 23 Feb 2018
HOCON (Human-Optimized Config Object Notation). https://github.com/lightbend/config/blob/master/HOCON.md. Accessed 23 Feb 2018
IBM Spark-Tacing. https://github.com/CODAI/spark-tracing. Accessed 23 Feb 2018
Intel HiBench Suite. https://github.com/intel-hadoop/HiBench. Accessed 23 Feb 2018
Li M et al (2015) SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark. https://research.spec.org/fileadmin/user_upload/documents/wg_bd/BD-20150401-spark_benchmark-v1.3-spec.pdf. Accessed 23 Feb 2018
Project Jupyter. http://jupyter.org/. Accessed 23 Feb 2018
TPC Decision Support Benchmark. http://www.tpc.org/tpcds/default.asp. Accessed 23 Feb 2018
YourKit Java Profiler. https://www.yourkit.com/java/profiler/features/. Accessed 23 Feb 2018
Zaharia M et al (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf. Accessed 23 Feb 2018

Download references

Author information

Authors and Affiliations

IBM, New York, USA
John Poelman & Emily May Curtin

Authors

John Poelman
View author publications
You can also search for this author in PubMed Google Scholar
Emily May Curtin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to John Poelman or Emily May Curtin .

Editor information

Editors and Affiliations

Institute of Computer Science, University of Tartu, Tartu, Estonia
Sherif Sakr
School of Information Technologies, Sydney University, Sydney, Australia
Albert Y. Zomaya

Section Editor information

Server Technologies, Oracle, Redwood Shores, California, USA
Meikel Poess
Database Systems and Information Management Group, Technische Universität Berlin, Berlin, Germany
Tilmann Rabl

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Poelman, J., Curtin, E.M. (2019). SparkBench. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_300

Download citation

DOI: https://doi.org/10.1007/978-3-319-77525-8_300
Published: 20 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics