Skip to main content

A Frequency Scaling Based Performance Indicator Framework for Big Data Systems

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11446))

Included in the following conference series:

  • 3585 Accesses

Abstract

It is important for big data systems to identify their performance bottleneck. However, the popular indicators such as resource utilizations, are often misleading and incomparable with each other. In this paper, a novel indicator framework which can directly compare the impact of different indicators with each other is proposed to identify and analyze the performance bottleneck efficiently. A methodology which can construct the indicator from the performance change with the CPU frequency scaling is described. Spark is used as an example of a big data system and two typical SQL benchmarks are used as the workloads to evaluate the proposed method. Experimental results show that the proposed method is accurate compared with the resource utilization method and easy to implement compared with the white-box method. Meanwhile, the analysis with our indicators leads to some interesting findings and valuable performance optimization suggestions for big data systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache spark. http://spark.apache.org/

  2. Google vm rightsizing service. https://cloud.google.com/compute/docs/instances/viewing-sizing-recommendations-for-instances

  3. Linux perf subsystem. https://perf.wiki.kernel.org/index.php/Main_Page

  4. Parquet. http://parquet.apache.org/

  5. Spec. http://www.spec.org/

  6. Stream. http://www.cs.virginia.edu/stream/

  7. Trace-analysis. https://github.com/kayousterhout/trace-analysis

  8. Cantrill, B., Shapiro, M.W., Leventhal, A.H., et al.: Dynamic instrumentation of production systems. In: USENIX Annual Technical Conference, General Track, pp. 15–28 (2004)

    Google Scholar 

  9. Conley, M., Vahdat, A., Porter, G.: Achieving cost-efficient, data-intensive computing in the cloud. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 302–314 (2015)

    Google Scholar 

  10. Dai, J., Huang, J., Huang, S., Huang, B., Liu, Y.: Hitune: dataflow-based performance analysis for big data cloud, pp. 87–100 (2011)

    Google Scholar 

  11. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: Proceedings of Operating Systems Design and Implementation, vol. 51, no. 1, pp. 107–113 (2004)

    Google Scholar 

  12. Dittrich, J.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. VLDB Endow. 3, 460–471 (2010)

    Google Scholar 

  13. Gao, F., Sair, S.: Long-term performance bottleneck analysis and prediction. In: International Conference on Computer Design, pp. 3–9 (2007)

    Google Scholar 

  14. Hackenberg, D., Molka, D.: Memory performance at reduced CPU clock speeds: an analysis of current x86\(\_\)64 processors. In: Workshop on Power-Aware Computing Systems, HotPower, pp. 5–9 (2012)

    Google Scholar 

  15. Koutoupis, P.: The linux ram disk. Linux+ Magzine, pp. 36–39 (2009)

    Google Scholar 

  16. Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30(7), 817–840 (2004)

    Google Scholar 

  17. Nambiar, R.O., Poess, M.: The making of TPC-DS. In: International Conference on Very Large Data Bases, pp. 1049–1058 (2006)

    Google Scholar 

  18. Ousterhout, K., Rasti, R., Ratnasamy, S., Shenker, S., Chun, B.G.: Making sense of performance in data analytics frameworks. In: 12nd USENIX Symposium on Networked Systems Design and Implementation, pp. 293–307 (2015)

    Google Scholar 

  19. Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: ACM SIGMOD International Conference on Management of Data, pp. 165–178 (2009)

    Google Scholar 

  20. Sambasivan, R.R., et al.: Diagnosing performance changes by comparing request flows. In: USENIX Conference on Networked Systems Design and Implementation, pp. 43–56 (2011)

    Google Scholar 

  21. Shi, J., et al.: Clash of the titans: Mapreduce vs. spark for large scale data analytics. Proc. VLDB Endow. 8(13), 2110–2121 (2015)

    Google Scholar 

  22. Sridharan, S., Patel, J.M.: Profiling R on a contemporary processor. Proc. VLDB Endow. 8(2), 173–184 (2014)

    Article  Google Scholar 

  23. Venkataraman, S., Yang, Z., Franklin, M.J., Recht, B., Stoica, I.: Ernest: efficient performance prediction for large-scale advanced analytics. In: Proceedings of USENIX Symposium on Networked System Design and Implementation, pp. 363–378 (2016)

    Google Scholar 

  24. Wang, C., Meng, X., Guo, Q., Weng, Z., Yang, C.: Automating characterization deployment in distributed data stream management systems. IEEE Trans. Knowl. Data Eng. 29(12), 2669–2681 (2017)

    Article  Google Scholar 

  25. Yoo, W., Larson, K., Baugh, L., Kim, S., Campbell, R.H.: ADP: automated diagnosis of performance pathologies using hardware events. In: ACM Sigmetrics/Performance Joint International Conference on Measurement and Modeling of Computer Systems, pp. 283–294 (2012)

    Google Scholar 

  26. Zhibin, Y., Xiong, W., Eeckhout, L., Bei, Z., Mendelson, A., Chengzhong, X.: Mia: metric importance analysis for big data workload characterization. IEEE Trans. Parallel Distrib. Syst. 29(6), 1371–1384 (2018)

    Article  Google Scholar 

Download references

Acknowledgement

This research was partially supported by the grants from National Key Research and Development Program of China (No. 2016YFB1000602, 2016YFB1000603); Natural Science Foundation of China (No. 91646203, 61532016, 61532010, 61379050, 61762082); Fundamental Research Funds for the Central Universities, Research Funds of Renmin University (No. 11XNL010); and Science and Technology Opening up Cooperation project of Henan Province (172106000077).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofeng Meng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yang, C., Du, Z., Meng, X., Du, Y., Duan, Z. (2019). A Frequency Scaling Based Performance Indicator Framework for Big Data Systems. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11446. Springer, Cham. https://doi.org/10.1007/978-3-030-18576-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18576-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18575-6

  • Online ISBN: 978-3-030-18576-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics