Towards High-Performance Implementations of a Custom HPC Kernel Using ® Array Building Blocks

Heinecke, Alexander; Klemm, Michael; Pabst, Hans; Pflüger, Dirk

doi:10.1007/978-3-642-30397-5_4

Alexander Heinecke¹⁹,
Michael Klemm²⁰,
Hans Pabst²⁰ &
…
Dirk Pflüger¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7174))

1203 Accesses
1 Citations

Abstract

Today’s highly parallel machines drive a new demand for parallel programming. Fixed power envelopes, increasing problem sizes, and new algorithms pose challenging targets for developers. HPC applications must leverage SIMD units, multi-core architectures, and heterogeneous computing platforms for optimal performance. This leads to low-level, non-portable code that is difficult to write and maintain. With Intel® Array Building Blocks (Intel ArBB), programmers focus on the high-level algorithms and rely on an automatic parallelization and vectorization with strong safety guarantees. Intel ArBB hides vendorspecific hardware knowledge by runtime just-in-time (JIT) compilation. This case study on data mining with adaptive sparse grids unveils how deterministic parallelism, safety, and runtime optimization make Intel ArBB practically applicable. Hand-tuned code is about 40% faster than ArBB, but needs about 8x more code. ArBB clearly outperforms standard semi-automatically parallelized C/C++ code by approximately 6x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blelloch, G.E.: Vector Models for Data-Parallel Computing (Artificial Intelligence), 1st edn. The MIT Press (1990)
Google Scholar
Borkar, S., Chien, A.A.: The Future of Microprocessors. Communications of the ACM 54(5), 67–77 (2011)
Article Google Scholar
Bungartz, H.-J., Griebel, M.: Sparse Grids. Acta Numerica 13, 147–269 (2004)
Article MathSciNet MATH Google Scholar
Ciechanowicz, P., Kuchen, H.: Enhancing Muesli’s Data Parallel Skeletons for Multi-core Computer Architectures. In: Proc. of the 2010 IEEE 12th Intl. Conf. on High Performance Comp. and Comm., Orlando, FL, pp. 108–113 (2010)
Google Scholar
Drepper, U., Molnar, I.: The Native POSIX Thread Library for Linux. Technical report, Redhat (2003)
Google Scholar
Faulk, S., Porter, A., et al.: Measuring HPC productivity. Intl. J. of High Performance Computing Applications, 459–473 (2004)
Google Scholar
Khronos OpenCL Working Group. The OpenCL Specification, Version 1.1, Document Revision 36 (2010)
Google Scholar
Heinecke, A., Pflüger, D.: Multi- and Many-Core Data Mining with Adaptive Sparse Grids. In: Proc. of the 2011 ACM Intl. Conf. on Computing Frontiers (2011) (accepted for publication)
Google Scholar
Intel Corp. Intel® Array Building Blocks Virtual Machine Specification, Version 1.0 Beta, Document Number 324820-002US (2011)
Google Scholar
ISO/IEC. Information Technology – Programming Languages – Fortran – Part 1: Base Language, ISO/IEC 1539-1 (2010)
Google Scholar
ISO/ISC. Standard for Programming Language C++, ISO/ISC DTR 19769, final draft (2011)
Google Scholar
Javed, N., Loulergue, F.: OSL: Optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays. In: Dou, Y., Gruber, R., Joller, J.M. (eds.) APPT 2009. LNCS, vol. 5737, pp. 436–451. Springer, Heidelberg (2009)
Chapter Google Scholar
Matsuzaki, K., Emoto, K.: Lessons from Implementing the biCGStab Method with SkeTo Library. In: Proc. of the 4th Intl. Workshop on High-level Parallel Programming and Applications, Baltimore, MD, pp. 15–24 (2010)
Google Scholar
Matsuzaki, K., Iwasaki, H., Emoto, K., Hu, Z.: A Library of Constructive Skeletons for Sequential Style of Parallel Programming. In: Proc. of the 1st Intl. Conf. on Scalable Information Systems, Hong Kong (2006)
Google Scholar
Newburn, C.J., McCool, M., et al.: Intel® Array Building Blocks: A Retargetable, Dynamic Compiler and Embedded Language. In: Proc. of the Intl. Symp. on Code Generation and Optimization, Chamonix, France, pp. 224–235 (2011) (to appear)
Google Scholar
OpenMP Architecture Review Board. OpenMP Application Program Interface, Version 3.0 (2008), http://www.openmp.org/
Pflüger, D.: Spatially Adaptive Sparse Grids for High-Dimensional Problems. Dissertation, Institut für Informatik, TU München, München (2010)
Google Scholar
Satish, N., Kim, C., et al.: Can Traditional Programming Bridge the Ninja Performance Gap for Throughput Applications? In: Proc. of the 17th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, London, UK (2012) (submitted)
Google Scholar
Sterling, T.L.: Productivity Metrics and Models for High Performance Computing. Intl. J. of High Performance Computing Applications 18(4), 433–440 (2004)
Article Google Scholar
Sutter, H.: The Free Lunch Is Over—A Fundamental Turn Toward Concurrency in Software. Dr. Dobb’s Journal 30(3) (2005)
Google Scholar
Sutter, H., Larus, J.: Software and the Concurrency Revolution. ACM Queue 3, 54–62 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität München, Boltzmannstr. 3, D-85748, Garching, Germany
Alexander Heinecke & Dirk Pflüger
Intel GmbH, Dornacher Str. 1, D-85622, Feldkirchen, Germany
Michael Klemm & Hans Pabst

Authors

Alexander Heinecke
View author publications
You can also search for this author in PubMed Google Scholar
Michael Klemm
View author publications
You can also search for this author in PubMed Google Scholar
Hans Pabst
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Pflüger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

High Performance Computing Center Stuttgart (HLRS), Universität Stuttgart, Nobelstraße 19, 70569, Stuttgart, Germany
Rainer Keller
Institute of Computer Science and Engineering, Karlsruhe Institute of Technology (KIT), Haid-und-Neu-Straße 7, 76131, Karlsruhe, Germany
David Kramer
Institute for Applied and Numerical Mathematics, SRG New Frontiers in High Performance Computing and Karlsruhe Institute of Technology (KIT), 4, Fritz-Erler-Straße 23, 76133, Karlsruhe, Germany
Jan-Philipp Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Heinecke, A., Klemm, M., Pabst, H., Pflüger, D. (2012). Towards High-Performance Implementations of a Custom HPC Kernel Using ® Array Building Blocks. In: Keller, R., Kramer, D., Weiss, JP. (eds) Facing the Multicore - Challenge II. Lecture Notes in Computer Science, vol 7174. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30397-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-30397-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30396-8
Online ISBN: 978-3-642-30397-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics