Skip to main content
  • 620 Accesses

Abstract

Before the advent of modern supercomputers, single processors were approaching the limit of improvement in operating frequency, and there was a memory wall problem. Even if the computing capacity of a single processor could be increased, the data supply capacity of the memory could not match the computing capacity. The increase in operating frequency also caused the problem, that is, power consumption increased faster than performance improvement. In other words, the limit of performance improvement of a single processor was becoming apparent. To solve these problems, parallel architectures, in which many single processors are connected by a communication mechanism, have been used. The two points, “programming conscious of parallelism” and “programming conscious of execution performance”, are very important for users, researchers, and programmers to make effective use of the present supercomputers equipped with tens of thousands of processors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.riken.jp/en/research/environment/kcomputer/.

  2. 2.

    The Institute of Physical and Chemical Research (http://www.riken.jp/en/).

  3. 3.

    https://www.top500.org/.

  4. 4.

    FLOPS denote a unit of calculation speed. One FLOPS is the execution of one floating point calculation per second. Thus, 160 MFLOPS is equivalent to 160 million floating point operations per second.

  5. 5.

    Reduced instruction set computing.

  6. 6.

    Single instruction multiple data: A class of parallel processing that applies one instruction to multiple data.

  7. 7.

    This approach is called cache blocking.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuo Minami .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Minami, K. (2019). Supercomputers and Application Performance. In: Geshi, M. (eds) The Art of High Performance Computing for Computational Science, Vol. 2. Springer, Singapore. https://doi.org/10.1007/978-981-13-9802-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9802-5_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9801-8

  • Online ISBN: 978-981-13-9802-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics