A Frequency Scaling Based Performance Indicator Framework for Big Data Systems

Yang, Chen; Du, Zhihui; Meng, Xiaofeng; Du, Yongjie; Duan, Zhiqiang

doi:10.1007/978-3-030-18576-3_2

Chen Yang^19,21,
Zhihui Du²⁰,
Xiaofeng Meng¹⁹,
Yongjie Du¹⁹ &
…
Zhiqiang Duan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11446))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3585 Accesses

Abstract

It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and analyze the performance bottleneck efficiently. A methodology which can construct the indicator from the performance change with the CPU frequency scaling is described. Spark is used as an example of a big data system and two typical SQL benchmarks are used as the workloads to evaluate the proposed method. Experimental results show that the proposed method is accurate compared with the resource utilization method and easy to implement compared with the white-box method. Meanwhile, the analysis with our indicators leads to some interesting findings and valuable performance optimization suggestions for big data systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache spark. http://spark.apache.org/
Google vm rightsizing service. https://cloud.google.com/compute/docs/instances/viewing-sizing-recommendations-for-instances
Linux perf subsystem. https://perf.wiki.kernel.org/index.php/Main_Page
Parquet. http://parquet.apache.org/
Spec. http://www.spec.org/
Stream. http://www.cs.virginia.edu/stream/
Trace-analysis. https://github.com/kayousterhout/trace-analysis
Cantrill, B., Shapiro, M.W., Leventhal, A.H., et al.: Dynamic instrumentation of production systems. In: USENIX Annual Technical Conference, General Track, pp. 15–28 (2004)
Google Scholar
Conley, M., Vahdat, A., Porter, G.: Achieving cost-efficient, data-intensive computing in the cloud. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 302–314 (2015)
Google Scholar
Dai, J., Huang, J., Huang, S., Huang, B., Liu, Y.: Hitune: dataflow-based performance analysis for big data cloud, pp. 87–100 (2011)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation, vol. 51, no. 1, pp. 107–113 (2004)
Google Scholar
Dittrich, J.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. VLDB Endow. 3, 460–471 (2010)
Google Scholar
Gao, F., Sair, S.: Long-term performance bottleneck analysis and prediction. In: International Conference on Computer Design, pp. 3–9 (2007)
Google Scholar
Hackenberg, D., Molka, D.: Memory performance at reduced CPU clock speeds: an analysis of current x86\(\_\)64 processors. In: Workshop on Power-Aware Computing Systems, HotPower, pp. 5–9 (2012)
Google Scholar
Koutoupis, P.: The linux ram disk. Linux+ Magzine, pp. 36–39 (2009)
Google Scholar
Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30(7), 817–840 (2004)
Google Scholar
Nambiar, R.O., Poess, M.: The making of TPC-DS. In: International Conference on Very Large Data Bases, pp. 1049–1058 (2006)
Google Scholar
Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S., Chun, B.G.: Making sense of performance in data analytics frameworks. In: 12nd USENIX Symposium on Networked Systems Design and Implementation, pp. 293–307 (2015)
Google Scholar
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: ACM SIGMOD International Conference on Management of Data, pp. 165–178 (2009)
Google Scholar
Sambasivan, R.R., et al.: Diagnosing performance changes by comparing request flows. In: USENIX Conference on Networked Systems Design and Implementation, pp. 43–56 (2011)
Google Scholar
Shi, J., et al.: Clash of the titans: Mapreduce vs. spark for large scale data analytics. Proc. VLDB Endow. 8(13), 2110–2121 (2015)
Google Scholar
Sridharan, S., Patel, J.M.: Profiling R on a contemporary processor. Proc. VLDB Endow. 8(2), 173–184 (2014)
Article Google Scholar
Venkataraman, S., Yang, Z., Franklin, M.J., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: Proceedings of USENIX Symposium on Networked System Design and Implementation, pp. 363–378 (2016)
Google Scholar
Wang, C., Meng, X., Guo, Q., Weng, Z., Yang, C.: Automating characterization deployment in distributed data stream management systems. IEEE Trans. Knowl. Data Eng. 29(12), 2669–2681 (2017)
Article Google Scholar
Yoo, W., Larson, K., Baugh, L., Kim, S., Campbell, R.H.: ADP: automated diagnosis of performance pathologies using hardware events. In: ACM Sigmetrics/Performance Joint International Conference on Measurement and Modeling of Computer Systems, pp. 283–294 (2012)
Google Scholar
Zhibin, Y., Xiong, W., Eeckhout, L., Bei, Z., Mendelson, A., Chengzhong, X.: Mia: metric importance analysis for big data workload characterization. IEEE Trans. Parallel Distrib. Syst. 29(6), 1371–1384 (2018)
Article Google Scholar

Download references

Acknowledgement

This research was partially supported by the grants from National Key Research and Development Program of China (No. 2016YFB1000602, 2016YFB1000603); Natural Science Foundation of China (No. 91646203, 61532016, 61532010, 61379050, 61762082); Fundamental Research Funds for the Central Universities, Research Funds of Renmin University (No. 11XNL010); and Science and Technology Opening up Cooperation project of Henan Province (172106000077).

Author information

Authors and Affiliations

School of Information, Renmin University, Beijing, China
Chen Yang, Xiaofeng Meng, Yongjie Du & Zhiqiang Duan
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Zhihui Du
School of Software, Zhengzhou University of Light Industry, Zhengzhou, China
Chen Yang

Authors

Chen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhihui Du
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Meng
View author publications
You can also search for this author in PubMed Google Scholar
Yongjie Du
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiang Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofeng Meng .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Guoliang Li
Duke University, Durham, NC, USA
Jun Yang
University of Porto, Porto, Portugal
Joao Gama
Chiang Mai University, Chiang Mai, Thailand
Juggapong Natwichai
Beihang University, Beijing, China
Yongxin Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, C., Du, Z., Meng, X., Du, Y., Duan, Z. (2019). A Frequency Scaling Based Performance Indicator Framework for Big Data Systems. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11446. Springer, Cham. https://doi.org/10.1007/978-3-030-18576-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-18576-3_2
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18575-6
Online ISBN: 978-3-030-18576-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics