Definitions
The meaning of the word benchmark is (Andersen and Pettersen 1995) A predefined position, used as a reference point for taking measures against. There is no clear formal definition of analytics benchmarks.
Jim Gray (1992) describes the benchmarking as follows: “This quantitative comparison starts with the definition of a benchmark or workload. The benchmark is run on several different systems, and the performance and price of each system is measured and recorded. Performance is typically a throughput metric (work/second) and price is typically a five-year cost-of-ownership metric. Together, they give a price/performance ratio.” In short, we define that a software benchmark is a program used for comparison of software products/tools executing on a pre-configured hardware environment.
Analytics benchmarks are a type of...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi D, Babu S, Ozcan F, Pandis I (2015) Tutorial: SQL-on-Hadoop systems. PVLDB 8(12):2050–2051
Agrawal D, Butt AR, Doshi K, Larriba-Pey J, Li M, Reiss FR, Raab F, Schiefer B, Suzumura T, Xia Y (2015) SparkBench – a spark performance testing suite. In: TPCTC, pp 26–44
Alsubaiee S, Altowim Y, Altwaijry H, Behm A, Borkar VR, Bu Y, Carey MJ, Cetindil I, Cheelangi M, Faraaz K, Gabrielova E, Grover R, Heilbron Z, Kim Y, Li C, Li G, Ok JM, Onose N, Pirzadeh P, Tsotras VJ, Vernica R, Wen J, Westmann T (2014) Asterixdb: a scalable, open source BDMS. PVLDB 7(14):1905–1916
AMPLab (2013) https://amplab.cs.berkeley.edu/benchmark/
Andersen B, Pettersen PG (1995) Benchmarking handbook. Champman & Hall, London
Armbrust M, Xin RS, Lian C, Huai Y, Liu D, Bradley JK, Meng X, Kaftan T, Franklin MJ, Ghodsi A, Zaharia M (2015) Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, 31 May–4 June 2015, pp 1383–1394
Armstrong TG, Ponnekanti V, Borthakur D, Callaghan M (2013) Linkbench: a database benchmark based on the Facebook social graph. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013, New York, 22–27 June 2013, pp 1185–1196
AsterixDB (2018) https://asterixdb.apache.org
BigFrame (2013) https://github.com/bigframeteam/BigFrame/wiki
Bog A (2013) Benchmarking transaction and analytical processing systems: the creation of a mixed workload benchmark and its application. PhD thesis. http://d-nb.info/1033231886
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink™: stream and batch processing in a single engine. IEEE Data Eng Bull 38(4):28–38. http://sites.computer.org/debull/A15dec/p28.pdf
Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27
Codd EF, Codd SB, Salley CT (1993) Providing OLAP (On-line analytical processing) to user-analysis: an IT mandate. White paper
Cooper BF, Silberstein A, Tam E, Ramakrishnan R, Sears R (2010) Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM symposium on cloud computing, SoCC 2010, Indianapolis, 10–11 June 2010, pp 143–154
Ferdman M, Adileh A, Koçberber YO, Volos S, Alisafaee M, Jevdjic D, Kaynak C, Popescu AD, Ailamaki A, Falsafi B (2012) Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the 17th international conference on architectural support for programming languages and operating systems, ASPLOS, pp 37–48
Ferrarons J, Adhana M, Colmenares C, Pietrowska S, Bentayeb F, Darmont J (2013) PRIMEBALL: a parallel processing framework benchmark for big data applications in the cloud. In: TPCTC, pp 109–124
Flink (2018) https://flink.apache.org/
Gelly (2015) https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html
Ghazal A, Rabl T, Hu M, Raab F, Poess M, Crolotte A, Jacobsen H (2013) Bigbench: towards an industry standard benchmark for big data analytics. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013, New York, 22–27 June 2013, pp 1197–1208
Ghazal A, Ivanov T, Kostamaa P, Crolotte A, Voong R, Al-Kateb M, Ghazal W, Zicari RV (2017) Bigbench V2: the new and improved bigbench. In: 33rd IEEE international conference on data engineering, ICDE 2017, San Diego, 19–22 Apr 2017, pp 1225–1236
GraphX (2018) https://spark.apache.org/graphx/
Gray J (1992) Benchmark handbook: for database and transaction processing systems. Morgan Kaufmann Publishers Inc., San Francisco
Hadoop (2018) https://hadoop.apache.org/
Han R, John LK, Zhan J (2018) Benchmarking big data systems: a review. IEEE Trans Serv Comput 11(3):580–597
Hellerstein JM, Ré C, Schoppmann F, Wang DZ, Fratkin E, Gorajek A, Ng KS, Welton C, Feng X, Li K, Kumar A (2012) The MADlib analytics library or MAD skills, the SQL. PVLDB 5(12):1700–1711
Hive (2018) https://hive.apache.org/
Hockney RW (1996) The science of computer benchmarking. SIAM, Philadelphia
Hogan T (2009) Overview of TPC benchmark E: the next generation of OLTP benchmarks. In: Performance evaluation and benchmarking, first TPC technology conference, TPCTC 2009, Lyon, 24–28 Aug 2009, Revised Selected Papers, pp 84–98
Hu H, Wen Y, Chua T, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687
Huang S, Huang J, Dai J, Xie T, Huang B (2010) The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: Workshops proceedings of the 26th IEEE ICDE international conference on data engineering, pp 41–51
Huppler K (2009) The art of building a good benchmark. In: Nambiar RO, Poess M (eds) Performance evaluation and benchmarking. Springer, Berlin/Heidelberg, pp 18–30
Impala (2018) https://impala.apache.org/
Ivanov T, Rabl T, Poess M, Queralt A, Poelman J, Poggi N, Buell J (2015) Big data benchmark compendium. In: TPCTC, pp 135–155
Kemper A, Neumann T (2011) Hyper: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of the 27th international conference on data engineering, ICDE 2011, Hannover, 11–16 Apr 2011, pp 195–206
Kim K, Jeon K, Han H, Kim SG, Jung H, Yeom HY (2008) Mrbench: a benchmark for mapreduce framework. In: 14th international conference on parallel and distributed systems, ICPADS 2008, Melbourne, 8–10 Dec 2008, pp 11–18
Kornacker M, Behm A, Bittorf V, Bobrovytsky T, Ching C, Choi A, Erickson J, Grund M, Hecht D, Jacobs M, Joshi I, Kuff L, Kumar D, Leblang A, Li N, Pandis I, Robinson H, Rorke D, Rus S, Russell J, Tsirogiannis D, Wanderman-Milne S, Yoder M (2015) Impala: a modern, open-source SQL engine for Hadoop. In: CIDR 2015, seventh biennial conference on innovative data systems research, Asilomar, 4–7 Jan 2015, Online proceedings
Li M, Tan J, Wang Y, Zhang L, Salapura V (2015) SparkBench: a comprehensive benchmarking suite for in memory data analytic platform spark. In: Proceedings of the 12th ACM international conference on computing frontiers, pp 53:1–53:8
Luo C, Zhan J, Jia Z, Wang L, Lu G, Zhang L, Xu C, Sun N (2012) CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Front Comp Sci 6(4):347–362
MADlib (2018) https://madlib.apache.org/
Meng X, Bradley JK, Yavuz B, Sparks ER, Venkataraman S, Liu D, Freeman J, Tsai DB, Amde M, Owen S, Xin D, Xin R, Franklin MJ, Zadeh R, Zaharia M, Talwalkar A (2016) Mllib: machine learning in Apache spark. J Mach Learn Res 17:34:1–34:7
MLlib (2018) https://spark.apache.org/mllib/
Nambiar R (2014) Benchmarking big data systems: introducing TPC express benchmark HS. In: Big data benchmarking – 5th international workshop, WBDB 2014, Potsdam, 5–6 Aug 2014, Revised Selected Papers, pp 24–28
Nambiar RO, Poess M (2006) The making of TPC-DS. In: Proceedings of the 32nd international conference on very large data bases, Seoul, 12–15 Sept 2006, pp 1049–1058
Nambiar R, Chitor R, Joshi A (2012) Data management – a look back and a look ahead. In: Specifying big data benchmarks – first workshop, WBDB 2012, San Jose, 8–9 May 2012, and second workshop, WBDB 2012, Pune, 17–18 Dec 2012, Revised Selected Papers, pp 11–19
Özcan F, Tian Y, Tözün P (2017) Hybrid transactional/analytical processing: a survey. In: Proceedings of the 2017 ACM international conference on management of data, SIGMOD conference 2017, Chicago, 14–19 May 2017, pp 1771–1775
Patil S, Polte M, Ren K, Tantisiriroj W, Xiao L, López J, Gibson G, Fuchs A, Rinaldi B (2011) YCSB++: benchmarking and performance debugging advanced features in scalable table stores. In: ACM symposium on cloud computing in conjunction with SOSP 2011, SOCC’11, Cascais, 26–28 Oct 2011, p 9
Pavlo A, Paulson E, Rasin A, Abadi DJ, DeWitt DJ, Madden S, Stonebraker M (2009) A comparison of approaches to large-scale data analysis. In: Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2009, Providence, 29 June–2 July 2009, pp 165–178
Pirzadeh P, Carey MJ, Westmann T (2015) BigFUN: a performance study of big data management system functionality. In: 2015 IEEE international conference on big data, pp 507–514
Poess M (2012) Tpc’s benchmark development model: making the first industry standard benchmark on big data a success. In: Specifying big data benchmarks – first workshop, WBDB 2012, San Jose, 8–9 May 2012, and second workshop, WBDB 2012, Pune, 17–18 Dec 2012, Revised Selected Papers, pp 1–10
Poess M, Rabl T, Jacobsen H, Caufield B (2014) TPC-DI: the first industry benchmark for data integration. PVLDB 7(13):1367–1378
Poess M, Rabl T, Jacobsen H (2017) Analysis of TPC-DS: the first standard benchmark for SQL-based big data systems. In: Proceedings of the 2017 symposium on cloud computing, SoCC 2017, Santa Clara, 24–27 Sept 2017, pp 573–585
Pöss M, Floyd C (2000) New TPC benchmarks for decision support and web commerce. SIGMOD Rec 29(4):64–71
Pöss M, Nambiar RO, Walrath D (2007) Why you should run TPC-DS: a workload analysis. In: Proceedings of the 33rd international conference on very large data bases, University of Vienna, 23–27 Sept 2007, pp 1138–1149
Raab F (1993) TPC-C – the standard benchmark for online transaction processing (OLTP). In: Gray J (ed) The benchmark handbook for database and transaction systems, 2nd edn. Morgan Kaufmann, San Mateo
Rockart JF, Ball L, Bullen CV (1982) Future role of the information systems executive. MIS Q 6(4): 1–14
Sakr S, Liu A, Fayoumi AG (2013) The family of MapReduce and large-scale data processing systems. ACM Comput Surv 46(1):11:1–11:44
Sangroya A, Serrano D, Bouchenak S (2012) MRBS: towards dependability benchmarking for Hadoop MapReduce. In: Euro-Par: parallel processing workshops, pp 3–12
Sethuraman P, Taheri HR (2010) TPC-V: a benchmark for evaluating the performance of database applications in virtual environments. In: Performance evaluation, measurement and characterization of complex systems – second TPC technology conference, TPCTC 2010, Singapore, 13–17 Sept 2010. Revised Selected Papers, pp 121–135
Shim JP, Warkentin M, Courtney JF, Power DJ, Sharda R, Carlsson C (2002) Past, present, and future of decision support technology. Decis Support Syst 33(2):111–126
Spark (2018) https://spark.apache.org
SparkSQL (2018) https://spark.apache.org/sql/
SparkStreaming (2018) https://spark.apache.org/streaming/
SPEC (2018) www.spec.org/
STAC (2018) www.stacresearch.com/
Storm (2018) https://storm.apache.org/
Tensorflow (2018) https://tensorflow.org
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive – a warehousing solution over a map-reduce framework. PVLDB 2(2):1626–1629
TPC (2018) www.tpc.org/
Wang L, Zhan J, Luo C, Zhu Y, Yang Q, He Y, Gao W, Jia Z, Shi Y, Zhang S, Zheng C, Lu G, Zhan K, Li X, Qiu B (2014) BigDataBench: a big data benchmark suite from internet services. In: 20th IEEE international symposium on high performance computer architecture, HPCA 2014, pp 488–499
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this entry
Cite this entry
Ivanov, T., Zicari, R.V. (2019). Analytics Benchmarks. In: Sakr, S., Zomaya, A.Y. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-77525-8_113
Download citation
DOI: https://doi.org/10.1007/978-3-319-77525-8_113
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77524-1
Online ISBN: 978-3-319-77525-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering