Skip to main content

Using Hardware Counters to Predict Vectorization

  • Conference paper
  • First Online:
  • 427 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11403))

Abstract

Vectorization is the process of transforming the scalar implementation of an algorithm into vector form. This transformation aims to benefit from parallelism through the generation of microprocessor vector instructions. Using abstract models and source level information, compilers can identify opportunities for auto-vectorization. However, compilers do not always predict the runtime effects accurately or completely fail to identify vectorization opportunities. This ultimately results in no performance improvement.

This paper takes on a new perspective by leveraging the use of runtime hardware counters to predict the potential for loop vectorization. Using supervised machine learning models, we can detect instances where vectorization can be applied (but the compilers fail to) with 80% validation accuracy. We also predict profitability and performance in different architectures.

We evaluate a wide range of hardware counters across different machine learning models. We show that dynamic features, extracted from performance data, implicitly include useful information about the host machine and runtime program behavior.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aumage, O., Barthou, D., Haine, C., Meunier, T.: Detecting SIMDization opportunities through static/dynamic dependence analysis. In: an Mey, D., et al. (eds.) Euro-Par 2013. LNCS, vol. 8374, pp. 637–646. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54420-0_62

    Chapter  Google Scholar 

  2. Banerjee, U.: An introduction to a formal theory of dependence analysis. J. Supercomput. 2(2), 133–149 (1988)

    Article  Google Scholar 

  3. Cammarota, R., Beni, L.A., Nicolau, A., Veidenbaum, A.V.: Optimizing program performance via similarity, using a feature-agnostic approach. In: Wu, C., Cohen, A. (eds.) APPT 2013. LNCS, vol. 8299, pp. 199–213. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-45293-2_15

    Chapter  Google Scholar 

  4. Demšar, J., et al.: Orange: data mining toolbox in python. J. Mach. Learn. Res. 14(1), 2349–2353 (2013)

    MATH  Google Scholar 

  5. Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach (2001)

    Google Scholar 

  6. Maleki, S., Gao, Y., Garzarán, M.J., Wong, T., Padua, D.A.: An evaluation of vectorizing compilers. In: Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, pp. 372–382 (2011)

    Google Scholar 

  7. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. In: Americas, vol. 32, pp. 2473–10013. Delhi Cambridge University Press (2008)

    Google Scholar 

  8. Reinders, J.: VTuneTM Performance Analyzer Essentials Measurement and Tuning Techniques for Software Developers (First.). Intel Press (2005)

    Google Scholar 

  9. Trouvé, A., et al.: Using machine learning in order to improve automatic SIMD instruction generation. Procedia Comput. Sci. 18, 1292–1301 (2013)

    Article  Google Scholar 

  10. Fursin, G., et al.: Milepost GCC: machine learning enabled self-tuning compiler. Int. J. Parallel Prog. 39(3), 296–327 (2011)

    Article  Google Scholar 

  11. Tournavitis, G., Wang, Z., Franke, B., O’Boyle, M.F.M.: Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping. In: ACM SIGPLAN Notices, pp. 177–187 (2009)

    Google Scholar 

  12. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  13. Weaver, V.M.: Linux perf_event features and overhead. In: The 2nd International Workshop on Performance Analysis of Workload Optimized Systems, FastPath, p. 80, April 2013

    Google Scholar 

  14. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, no. 2, pp. 1137–1145, August 1995

    Google Scholar 

  15. Chen, Z., et al.: LORE: a loop repository for the evaluation of compilers. In: 2017 IEEE International Symposium on Workload Characterization (in press)

    Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Award 1533912.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neftali Watkinson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Watkinson, N., Shivam, A., Chen, Z., Veidenbaum, A., Nicolau, A., Gong, Z. (2019). Using Hardware Counters to Predict Vectorization. In: Rauchwerger, L. (eds) Languages and Compilers for Parallel Computing. LCPC 2017. Lecture Notes in Computer Science(), vol 11403. Springer, Cham. https://doi.org/10.1007/978-3-030-35225-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35225-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35224-0

  • Online ISBN: 978-3-030-35225-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics