Advertisement

Automatic Generation of Program Affinity Policies Using Machine Learning

  • Ryan W. Moore
  • Bruce R. Childers
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7791)

Abstract

Modern scientific and server programs require multisocket, multicore machines to achieve good performance. Maximizing the performance of these programs requires careful consideration of program behavior and careful management of hardware resources. In particular, a program’s affinity can have a critical performance effect. For such machines, there are many possible affinities for a multithreaded program. In this paper, we present AutoFinity, a solution to automatically generate program affinity policies that consider program behavior and the target machine. The policies are constructed with machine learning and used online to select an affinity. We implemented AutoFinity on a 4-processor, 48-core machine and evaluated it on 18 multithreaded programs with varying thread counts. Our results show that in 12 out of 15 cases where affinity impacts runtime, the policy generated by AutoFinity chose affinities that improved performance versus assignments that do not consider program and machine behavior.

Keywords

policy generation runtime adaptation parallel performance 

References

  1. 1.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: Characterization and architectural implications. In: Proc. of the 17th Int’l Conf. on Parallel Architectures and Compilation Techniques (October)Google Scholar
  2. 2.
    Blagodurov, S., Zhuravlev, S., Dashti, M., Fedorova, A.: A case for NUMA-aware contention management on multicore systems. In: Proc. of the 2011 USENIX Conf. on USENIX Annual Tech. Conf., USENIXATC 2011. USENIX Assoc., Berkeley (2011)Google Scholar
  3. 3.
    Dorta, A., Rodriguez, C., de Sande, F.: The OpenMP source code repository. In: 13th Euromicro Conf. on Parallel, Distributed and Network-Based Processing, PDP 2005 (February 2005)Google Scholar
  4. 4.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. (November)Google Scholar
  5. 5.
    Klug, T., Ott, M., Weidendorfer, J., Trinitis, C.: autopin – Automated Optimization of Thread-to-Core Pinning on Multicore Systems. In: Stenström, P. (ed.) Transactions on HiPEAC III. LNCS, vol. 6590, pp. 219–235. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In: Proc. of the 37th Annual Int’l Symp. on Computer Architecture, ISCA 2010. ACM (2010)Google Scholar
  7. 7.
    NAS Parallel Benchmarks Team: NAS parallel benchmarks 3.3.1 (2009)Google Scholar
  8. 8.
    Radojković, P., Čakarević, V., Verdú, J., Pajuelo, A., Cazorla, F.J., Nemirovsky, M., Valero, M.: Thread to strand binding of parallel network applications in massive multi-threaded systems. In: Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPoPP 2010. ACM (2010)Google Scholar
  9. 9.
    Song, F., Moore, S., Dongarra, J.: Analytical modeling and optimization for affinity based thread scheduling on multicore systems. In: IEEE Int’l Conference on Cluster Computing and Workshops, CLUSTER 2009 (2009)Google Scholar
  10. 10.
    Tam, D., Azimi, R., Stumm, M.: Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In: Proc. of the 2nd ACM SIGOPS/EuroSys European Conf. on Comp. Systems, EuroSys 2007 (2007)Google Scholar
  11. 11.
    Terboven, C., an Mey, D., Schmidl, D., Jin, H., Reichstein, T.: Data and thread affinity in openmp programs. In: Proc. of the 2008 Workshop on Memory Access on Future Processors: A Solved Problem?, MAW 2008. ACM (2008)Google Scholar
  12. 12.
    Tian, K., Jiang, Y., Zhang, E.Z., Shen, X.: An input-centric paradigm for program dynamic optimizations. In: Proc. of the ACM Int’l Conf. on Object Oriented Programming Systems Languages and Applications, OOPSLA 2010. ACM (2010)Google Scholar
  13. 13.
    Wang, W., Dey, T., Moore, R.W., Aktasoglu, M., Childers, B.R., Davidson, J.W., Irwin, M.J., Kandemir, M., Soffa, M.L.: REEact: a customizable virtual execution manager for multicore platforms. In: Proc. of the 8th ACM SIGPLAN/SIGOPS Conf. on Virtual Execution Environments, VEE 2012. ACM (2012)Google Scholar
  14. 14.
    Wang, Z., O’Boyle, M.F.: Mapping parallelism to multi-cores: a machine learning based approach. In: Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPoPP 2009. ACM (2009)Google Scholar
  15. 15.
    Zhang, E.Z., Jiang, Y., Shen, X.: Does cache sharing on modern cmp matter to the performance of contemporary multithreaded programs? In: Proc. of the 15th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, PPoPP 2010. ACM (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ryan W. Moore
    • 1
  • Bruce R. Childers
    • 1
  1. 1.University of PittsburghPittsburghUSA

Personalised recommendations