Analytical Communication Performance Models as a metric in the partitioning of data-parallel kernels on heterogeneous platforms
- 65 Downloads
Data partitioning on heterogeneous HPC platforms is formulated as an optimization problem. The algorithm departs from the communication performance models of the processes representing their speeds and outputs a data tiling that minimizes the communication cost. Traditionally, communication volume is the metric used to guide the partitioning, but such metric is unable to capture the complexities introduced by uneven communication channels and the variety of patterns in the kernel communications. We discuss Analytical Communication Performance Models as a new metric in partitioning algorithms. They have not been considered in the past because of two reasons: prediction inaccuracy and lack of tools to automatically build and solve kernel communication formal expressions. We show how communication performance models fit the specific kernel and platform, and we present results that equal or even improve previous volume-based strategies.
KeywordsPartitioning algorithms Communication performance models Communication optimization Hybrid data-parallel kernels
This work was supported by the European Regional Development Fund ‘A way to achieve Europe’ (ERDF) and the Extremadura Local Government (Ref. IB16118). It was also partially supported by the computing facilities of Extremadura Research Center for Advanced Technologies (CETA-CIEMAT), funded by the European Regional Development Fund (ERDF).
- 3.Dongarra J, Pineau JF, Robert Y, Vivien F (2008) Matrix product on heterogeneous master-worker platforms. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, ACM, New York, NY, USA, PPoPP ’08, pp 53–62Google Scholar
- 6.Lastovetsky A, Reddy R (2010) Distributed data partitioning for heterogeneous processors based on partial estimation of their functional performance models. In: Lin HX, Alexander M, Forsell M, Knüpfer A, Prodan R, Sousa L, Streit A (eds) Euro-Par 2009—parallel processing workshops. Springer, Berlin, pp 91–101Google Scholar
- 4.van de Geijn RA, Watts J (1995) SUMMA: scalable universal matrix multiplication algorithm. Technical Report, Austin, TX, USAGoogle Scholar