Virtualized Big Data Benchmarks
Virtualized big data benchmarks measure the performance of big data processing systems, such as Hadoop, Spark, and Hive, on virtual infrastructures (The term virtual infrastructure could either mean on-premise or cloud infrastructure. While virtualization is not strictly necessary to support cloud computing, it is typically a foundational element of all major cloud computing services.). They are important for making quantitative and qualitative comparisons of different systems.
Big data applications are resource intensive and, as such, have historically been deployed exclusively on dedicated physical hardware clusters, i.e., bare-metal systems. As big data processing moved out of the specialized domain of Web 2.0 companies into the mainstream of business-critical enterprise applications, enterprises have been looking to take advantage of the numerous benefits of virtualization such as...
- Buell J (2013) Virtualized hadoop performance with VMware vSphere 6 on high-performance servers. Tech White Pap. VMware Inc.Google Scholar
- HADOOP-8468 (2018) Umbrella of enhancements to support different failure and locality topologies. https://issues.apache.org/jira/browse/HADOOP-8468. Accessed 17 Jan 2018
- HDFS Architecture Guide (2018) https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html. Accessed 22 Jan 2018
- Ivanov T, Zicari RV, Izberovic S, Tolle K (2014) Performance evaluation of virtualized hadoop clusters. ArXiv Prepr. ArXiv14113811Google Scholar
- Magdon-Ismail T, Nelson M, Cheveresan R, Scales D, King A, Vandrovec P, McDougall R (2013) Toward an elastic elephant enabling hadoop for the cloud. VMware Tech JGoogle Scholar
- O’Malley O (2008) TeraByte sort on apache hadoop. White Pap. Yahoo! Inc.Google Scholar
- TPCx-BB (2018) Benchmark. http://www.tpc.org/tpcx-bb/default.asp. Accessed 17 Jan 2018
- TPCx-HS (2018) Benchmark. http://www.tpc.org/tpcx-hs/default.asp?version=2. Accessed 17 Jan 2018