On Big Data Benchmarking

Han, Rui; Lu, Xiaoyi; Xu, Jiangtao

doi:10.1007/978-3-319-13021-7_1

On Big Data Benchmarking

Rui Han¹⁶,
Xiaoyi Lu¹⁷ &
Jiangtao Xu¹⁸

Conference paper
First Online: 11 November 2014

2040 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8807))

Abstract

Big data systems address the challenges of capturing, storing, managing, analyzing, and visualizing big data. Within this context, developing benchmarks to evaluate and compare big data systems has become an active topic for both research and industry communities. To date, most of the state-of-the-art big data benchmarks are designed for specific types of systems. Based on our experience, however, we argue that considering the complexity, diversity, and rapid evolution of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads. Given this motivation, in this paper, we first propose the key requirements and challenges in developing big data benchmarks from the perspectives of generating data with 4 V properties (i.e. volume, velocity, variety and veracity) of big data, as well as generating tests with comprehensive workloads for big data systems. We then present the methodology on big data benchmarking designed to address these challenges. Next, the state-of-the-art are summarized and compared, following by our vision for future research directions.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Big data benchmark by amplab of uc berkeley (2013). https://amplab.cs.berkeley.edu/benchmark/
Gridmix (2013). https://hadoop.apache.org/docs/r1.2.1/gridmix.html
Ibm big data platform (2013). http://www-01.ibm.com/software/data/bigdata/
Pigmix (2013). https://cwiki.apache.org/confluence/display/PIG/PigMix
Sort benchmark (2013). http://sortbenchmark.org/
Standard performance evaluation corporation (spec) (2013). http://www.spec.org/gwpg/wpc.static/wpcv1info.html
Tpc transaction processing performance council (2013). http://www.tpc.org/
Armstrong, T.G., Ponnekanti, V., Borthakur, D., Callaghan, M.: Linkbench: a database benchmark based on the facebook social graph. In: Proceedings of the 2013 International Conference on Management of Data, pp. 1185–1196. ACM (2013)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)
Google Scholar
Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: A study of emerging workloads on modern hardware. Technical report (2011)
Google Scholar
Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: Bigbench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 International Conference on Management of Data, pp. 1197–1208. ACM (2013)
Google Scholar
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)
Google Scholar
Jia, Z., Wang, L., Zhan, J., Zhang, L., Luo, C.: Characterizing data analysis workloads in data centers. In: 2013 IEEE International Symposium on Workload Characterization (IISWC), pp 66–76. IEEE (2013)
Google Scholar
Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: Bdgs: A scalable big data generator suite in big data benchmarking. In: Rabl, T., et al. (eds.) Advancing Big Data Benchmarks. LNCS, pp. 138–154. Springer, Heidelberg (2014)
Chapter Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM (2009)
Google Scholar
Rabl, T., Frank, M., Sergieh, H.M., Kosch, H.: A data generator for cloud-scale benchmarking. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 41–56. Springer, Heidelberg (2011)
Chapter Google Scholar
Tay, Y.: Data generation for application-specific benchmarking. VLDB, Challenges and Visions (2011)
Google Scholar
Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., et al.: Bigdatabench: A big data benchmark suite from internet services. In: Proceedings of the 20th IEEE International Symposium On High Performance Computer Architecture (HPCA-2014), IEEE (2014)
Google Scholar
Zhu, Y., Zhan, J., Weng, C., Nambiar, R., Zhang, J., Chen, X., Wang, L.: BigOP: generating comprehensive big data workloads as a benchmarking framework. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part II. LNCS, vol. 8422, pp. 483–492. Springer, Heidelberg (2014)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, Imperial College London, London, UK
Rui Han
Ohio State University, Columbus, USA
Xiaoyi Lu
Beijing Jiaotong University, Beijing, China
Jiangtao Xu

Authors

Rui Han
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyi Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jiangtao Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Han .

Editor information

Editors and Affiliations

ICT, Chinese Academy of Sciences, Beijing, China
Jianfeng Zhan
ICT, Chinese Academy of Sciences, Beijing, China
Rui Han
Shannon (IT) Lab., Huawei, China
Chuliang Weng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, R., Lu, X., Xu, J. (2014). On Big Data Benchmarking. In: Zhan, J., Han, R., Weng, C. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2014. Lecture Notes in Computer Science(), vol 8807. Springer, Cham. https://doi.org/10.1007/978-3-319-13021-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-13021-7_1
Published: 11 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13020-0
Online ISBN: 978-3-319-13021-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics