Skip to main content

The Impact of Global Communication Latency at Extreme Scales on Krylov Methods

  • Conference paper
Algorithms and Architectures for Parallel Processing (ICA3PP 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7439))

Abstract

Krylov Subspace Methods (KSMs) are popular numerical tools for solving large linear systems of equations. We consider their role in solving sparse systems on future massively parallel distributed memory machines, by estimating future performance of their constituent operations. To this end we construct a model that is simple, but which takes topology and network acceleration into account as they are important considerations. We show that, as the number of nodes of a parallel machine increases to very large numbers, the increasing latency cost of reductions may well become a problematic bottleneck for traditional formulations of these methods. Finally, we discuss how pipelined KSMs can be used to tackle the potential problem, and appropriate pipeline depths.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Retrieved from MVAPICH2 website (2012), http://mvapich.cse.ohio-state.edu/performance/interNode.shtml (2011)

  2. Adiga, N., et al.: An overview of the BlueGene/L supercomputer. In: ACM/IEEE 2002 Conference on Supercomputing, p. 60 (November 2002)

    Google Scholar 

  3. Ajima, Y., Sumimoto, S., Shimizu, T.: Tofu: A 6D mesh/torus interconnect for exascale computers. Computer 42(11), 36–40 (2009)

    Article  Google Scholar 

  4. Arimilli, B., et al.: The PERCS high-performance interconnect. In: IEEE HOTI 2010, pp. 75–82 (August 2010)

    Google Scholar 

  5. Ashby, T.J., O’Boyle, M.: Iterative collective loop fusion. In: Mycroft, A., Zeller, A. (eds.) CC 2006. LNCS, vol. 3923, pp. 202–216. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Chen, D., Eisley, N.A., Heidelberger, P., Senger, R.M., Sugawara, Y., Kumar, S., Salapura, V., Satterfield, D.L., Steinmacher-Burow, B., Parker, J.J.: The IBM BlueGene/Q interconnection network and message unit. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. SC 2011, pp. 26–27. ACM, New York (2011)

    Google Scholar 

  7. Ghysels, P., Ashby, T.J., Meerbergen, K., Vanroose, W.: Hiding global communication latency in the GMRES algorithm on massively parallel machines (to be published, 2012)

    Google Scholar 

  8. Gmeiner, B., Gradl, T., Köstler, H., Rüde, U.: Analysis of a flat highly parallel geometric multigrid algorithm for hierarchical hybrid grids. Technical report, Dept. Comp. Sci., Universität Erlangen-Nürnberg (2011)

    Google Scholar 

  9. Hernández, V., Román, J.E., Tomás, A.: A parallel variant of the Gram-Schmidt process with reorthogonalization. In: PARCO, pp. 221–228 (2005)

    Google Scholar 

  10. Hoefler, T., Lumsdaine, A.: Overlapping communication and computation with high level communication routines. In: Proceedings of the 8th IEEE Symposium on Cluster Computing and the Grid (CCGrid 2008) (May 2008)

    Google Scholar 

  11. Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: ACM/IEEE Supercomputing 2010, pp. 1–11 (2010)

    Google Scholar 

  12. Moody, A., Fernandez, J., Petrini, F., Panda, D.K.: Scalable NIC-based reduction on large-scale clusters. In: ACM/IEEE Supercomputing 2003, pages 59 (2003)

    Google Scholar 

  13. Stevens, R., White, A., et al.: Architectures and technology for extreme scale computing. Technical report, ASCR Scientic Grand Challenges Workshop Series (December 2009)

    Google Scholar 

  14. Tianruo Yang, L., Brent, R.: The improved Krylov subspace methods for large and sparse linear systems on bulk synchronous parallel architectures. In: IEEE IPDPS 2003, p. 11 (April 2003)

    Google Scholar 

  15. Udipi, A.N., Muralimanohar, N., Balasubramonian, R., Davis, A., Jouppi, N.P.: Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems. In: ISCA 2011, pp. 425–436 (2011)

    Google Scholar 

  16. Woo, D.H., Seong, N.H., Lewis, D., Lee, H.-H.: An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth. In: IEEE HPCA 2010, pp. 1–12 (January 2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ashby, T.J., Ghysels, P., Heirman, W., Vanroose, W. (2012). The Impact of Global Communication Latency at Extreme Scales on Krylov Methods. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33078-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33078-0_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33077-3

  • Online ISBN: 978-3-642-33078-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics