Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Virtualized Big Data Benchmarks

  • Tariq Magdon-Ismail
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_120-1

Synonyms

Definition

Virtualized big data benchmarks measure the performance of big data processing systems, such as Hadoop, Spark, and Hive, on virtual infrastructures (The term virtual infrastructure could either mean on-premise or cloud infrastructure. While virtualization is not strictly necessary to support cloud computing, it is typically a foundational element of all major cloud computing services.). They are important for making quantitative and qualitative comparisons of different systems.

Historical Background

Big data applications are resource intensive and, as such, have historically been deployed exclusively on dedicated physical hardware clusters, i.e., bare-metal systems. As big data processing moved out of the specialized domain of Web 2.0 companies into the mainstream of business-critical enterprise applications, enterprises have been looking to take advantage of the numerous benefits of virtualization such as...

This is a preview of subscription content, log in to check access.

References

  1. Buell J (2013) Virtualized hadoop performance with VMware vSphere 6 on high-performance servers. Tech White Pap. VMware Inc.Google Scholar
  2. HADOOP-8468 (2018) Umbrella of enhancements to support different failure and locality topologies. https://issues.apache.org/jira/browse/HADOOP-8468. Accessed 17 Jan 2018
  3. HDFS Architecture Guide (2018) https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 22 Jan 2018
  4. Ivanov T, Zicari RV, Izberovic S, Tolle K (2014) Performance evaluation of virtualized hadoop clusters. ArXiv Prepr. ArXiv14113811Google Scholar
  5. Magdon-Ismail T, Nelson M, Cheveresan R, Scales D, King A, Vandrovec P, McDougall R (2013) Toward an elastic elephant enabling hadoop for the cloud. VMware Tech JGoogle Scholar
  6. O’Malley O (2008) TeraByte sort on apache hadoop. White Pap. Yahoo! Inc.Google Scholar
  7. TPCx-BB (2018) Benchmark. http://www.tpc.org/tpcx-bb/default.asp. Accessed 17 Jan 2018
  8. TPCx-HS (2018) Benchmark. http://www.tpc.org/tpcx-hs/default.asp?version=2. Accessed 17 Jan 2018

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.VMware, Inc.Palo AltoUSA

Section editors and affiliations

  • Meikel Poess
    • 1
  • Tilmann Rabl
    • 2
  1. 1.Server TechnologiesOracleRedwood ShoresUSA
  2. 2.Database Systems and Information Management GroupTechnische Universität BerlinBerlinGermany