Abstract
The massive addition of cores on a chip is adding more pressure to the accesses to main memory. In order to avoid this bottleneck, we propose the use of a simple producer-consumer model, which allows for the temporary results to be transferred directly from one task to another. These data transfer operations are performed within the chip, using on-chip memory, thus avoiding costly off-chip memory accesses. We implement this model on a real many-core processor, the 48-core Intel Single-chip Cloud Computer processor using its on-chip memory facilities. We find that the Producer-Consumer model adapts to such architectures and allow to achieve good task and data parallelism. For the evaluation of the proposed platform we implement a graph-based application using the Producer- Consumer model. Our tests show that the model scales very well as it takes advantage of the on-chip memory. The execution times of our implementation are up to 9 times faster than the baseline implementation, which relies on storing the temporary results to main memory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mattson, T., Riepen, M., Lehnig, T., Brett, P., Haas, W., Kennedy, P., Howard, J., Vangal, S., Borkar, N., Ruhl, G., Dighe, S.: he 48-core SCC processor: the Programmer’s view. In: SC, pp. 1–11. IEEE Computer Society (2010)
Becchetti, L., Boldi, P., Castillo, C., Gionis, A.: Efficient Semi-streaming Algorithms for Local Triangle Counting in Massive Graphs. In: KDD, pp. 16–24 (2008)
Leskovec, J., Backstrom, L., Kumar, R., Tomkins, A.: Microscopic evolution of social networks. In: KDD, pp. 462–470. ACM (2008)
Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley Professional (2004)
Hamosfakidis, A., Paker, Y., Cosmas, J.: A Study of Concurrency in MPEG-4 Video Encoder. In: ICMCS, pp. 204–207. IEEE (1998)
Tulip, J., Bekkema, J., Nesbitt, K.: Multi-threaded Game Engine Design. In: CGIE, pp. 9–14. Murdoch University (2006)
Welser, H., Gleave, E., Fisher, D., Smith, M.: Visualizing the Signatures of Social Roles in Online Discussion Groups. JoSS 8(2), 1–31 (2007)
Andrews, J., Baker, N.: Xbox 360 System Architecture. IEEE Micro 26(2), 25–37 (2006)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Physical Review E 78(4), 046110 (2008)
Nvidia, C.: Nvidia CUDA Programming Guide (2012), http://docs.nvidia.com/cuda/index.html
Stratton, J.A., Stone, S.S., Hwu, W.-M.W.: MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)
Kim, W., Voss, M.: Multicore Desktop Programming with Intel Threading Building Blocks. IEEE Software 28(1), 23–31 (2011)
McCool, M., D’Amora, B.: Programming Using RapidMind on the Cell BE. In: SC, p. 222. ACM (2006)
Saraswat, V., Sarkar, V., von Praun, C.: X10: Concurrent Programming for Modern Architectures. In: PPoPP, p. 271. ACM (2007)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. CACM 51(1), 107–113 (2008)
Yoo, R., Romano, A., Kozyrakis, C.: Phoenix rebirth: Scalable MapReduce on a Large-scale Shared-memory System. In: IISWC, pp. 198–207 (October 2009)
He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: a MapReduce Framework on Graphics Processors. In: PACT, pp. 260–269 (2008)
Herlihy, M., Moss, J.: Transactional memory: Architectural Support for Lock-free Data Structures. SIGARCH 21(2), 289–300 (1993)
Kyriacou, C., Evripidou, P., Trancoso, P.: Data-Driven Multithreading Using Conventional Microprocessors. TPDS 17, 1176–1188 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, JL., Trancoso, P. (2013). Producer-Consumer: The Programming Model for Future Many-Core Processors. In: Kubátová, H., Hochberger, C., Daněk, M., Sick, B. (eds) Architecture of Computing Systems – ARCS 2013. ARCS 2013. Lecture Notes in Computer Science, vol 7767. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36424-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-36424-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36423-5
Online ISBN: 978-3-642-36424-2
eBook Packages: Computer ScienceComputer Science (R0)