Abstract
Currently, many cloud providers deploy their big data processing systems as cloud services, which helps users conveniently manage and process their data in clouds. Among different service providers’ big data processing services, how to evaluate and compare their scalability is an interesting and challenging work. Most traditional benchmark tools focus on performance evaluation of big data processing systems, such as aggregated throughput and IOPS, but fail to conduct a quantitative analysis of their scalability. In this paper, we propose a measurement methodology to quantify the scalability of big data processing services, which makes the cloud services scalability comparable. We conduct a group of comparative experiments on AliCloud E-MapReduce and Baidu MRS, and collect their respective scalability characteristics under Hadoop and Spark workloads. The scalability characteristics observed in our work could help cloud users choose the best cloud service platform to set up an optimized big data processing system to achieve their specific goals more successfully.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hadoop. http://hadoop.apache.org/
Spark. https://spark.apache.org/
Amazon EMR. https://aws.amazon.com/cn/emr/
AliCloud E-MapReduce. https://www.aliyun.com/product/emapreduce?utm_medium=text&utm_source=baidu&utm_campaign=emr&utm_content=se_331947
Baidu BMR. https://cloud.baidu.com/product/bmr.html?track=cp:nsem|pf:pc|pp:bmr|pu:brand|ci:|kw:50293
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: SoCC, pp. 143–154 (2010)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. Spec. Interest Group Oper. Syst. Oper. Syst. Rev. 44(2), 35–40 (2010)
George, L.: HBase - The Definitive Guide. O’Reilly, Newton (2011)
Cooper, B.F., et al.: PNUTS: Yahoo!’s hosted data serving platform. Proc. VLDB Endow. 1(2), 1277–1288 (2008)
Shi, Y., Meng, X., Zhao, J., Hu, X., Liu, B., Wang, H.: Benchmarking cloud-based data management systems. In: Proceedings of the Second International Workshop on Cloud Data Management, pp. 47–54. ACM (2010)
Ferdman, M., et al.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: ACM SIGARCH Computer Architecture News, vol. 40, pp. 37–48. ACM (2012)
Jia, Z., et al.: Understanding big data analytics workloads on modern processors. IEEE Trans. Parallel Distrib. Syst. 28(6), 1797–1810 (2017)
Jia, Z., Wang, L., Zhan, J., Zhang, L., Luo, C.: Characterizing data analysis workloads in data centers. In: IISWC, pp. 66–76. IEEE (2013)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)
Gray, J.: Graysort benchmark. Sort Benchmark. http://sortbenchmark.org
Luo, C., et al.: CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications. Front. Comput. Sci. 6(4), 347–362 (2012)
Jia, Z., et al.: The implications of diverse applications and scalable data sets in benchmarking big data systems. In: Rabl, T., Poess, M., Baru, C., Jacobsen, H.-A. (eds.) WBDB -2012. LNCS, vol. 8163, pp. 44–59. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-53974-9_5
Baru, C., Bhandarkar, M., Nambiar, R., Poess, M., Rabl, T.: Benchmarking big data systems and the bigdata top100 list. Big Data 1(1), 60–64 (2013)
Dede, E., Fadika, Z., Govindaraju, M., Ramakrishnan, L.: Benchmarking MapReduce implementations under different application scenarios. Future Gener. Comput. Syst. 36, 389–399 (2014)
Ming, Z., et al.: BDGS: a scalable big data generator suite in big data benchmarking. arXiv preprint arXiv:1401.5465 (2014)
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Special Interest Group on Management Of Data, pp. 165–178. ACM (2009)
Rizzelli, G., Maier, G., Quagliotti, M., Schiano, M., Pattavina, A.: Assessing the scalability of next-generation wavelength switched optical networks. J. Lightwave Technol. 32(12), 2263–2270 (2014)
Badia, S., MartĂn, A.F., Principe, J.: Implementation and scalability analysis of balancing domain decomposition methods. Arch. Comput. Methods Eng. 20(3), 239–262 (2013)
Gunther, N., Puglia, P., Tomasette, K.: Hadoop superlinear scalability. Queue 13(5), 20 (2015)
Gao, J., Pattabhiraman, P., Bai, X., Tsai, W.T.: Saas performance and scalability evaluation in clouds. In: 2011 IEEE 6th International Symposium on Service Oriented System Engineering (SOSE), pp. 61–71. IEEE (2011)
Jiang, C., Han, G., Lin, J., Jia, G., Shi, W., Wan, J.: Characteristics of co-allocated online services and batch jobs in internet data centers: a case study from alibaba cloud. IEEE Access 7, 22495–22508 (2019)
Jiang, C., et al.: Energy efficiency comparison of hypervisors. Sustain. Comput.: Inf. Syst. 22, 311–321 (2019)
Jiang, C., et al.: Interdomain I/O optimization in virtualized sensor networks. Sensors 18(12), 4395 (2018)
Qiu, Y., Jiang, C., Wang, Y., Ou, D., Li, Y., Wan, J.: Energy aware virtual machine scheduling in data centers. Energies 12(4), 646 (2019)
Terasort. https://hadoop.apache.org/docs/current/api/org/apache/hadoop/examples/terasort/package-summary.html
OMalley, O.: Terabyte sort on apache Hadoop. Yahoo, pp. 1–3, May 2008. http://sortbenchmark.org/Yahoo-Hadoop.pdf
Acknowledgement
This work is supported by Natural Science Foundation of China (No. 61472109, No. 61572163 and No. 61472112) and Key Research and Development Program of Zhejiang Province (No. 2018C01098,2019C01059 and 2019C03134). This work is also supported in part by National Science Foundation (NSF) grant CNS-1205338 and CNS-1561216, and by the Introduction of Innovative R&D team program of Guangdong Province (No. 201001D0104726115). This work is supported by Alibaba Group through Alibaba Innovative Research (AIR) Program. This work is partially supported by Visiting Scholarship of Teachers’ Professional Development Program (No. FX2018050).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, X. et al. (2019). Scalability Evaluation of Big Data Processing Services in Clouds. In: Zheng, C., Zhan, J. (eds) Benchmarking, Measuring, and Optimizing. Bench 2018. Lecture Notes in Computer Science(), vol 11459. Springer, Cham. https://doi.org/10.1007/978-3-030-32813-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-32813-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32812-2
Online ISBN: 978-3-030-32813-9
eBook Packages: Computer ScienceComputer Science (R0)