Abstract
It has been already verified that hardware-supported finegrain synchronization provides a significant performance improvement over coarse-grained synchronization mechanisms, such as barriers. Support for fine-grain synchronization on individual data items becomes notably important in order to efficiently exploit thread-level parallelism available on multi-threading and multi-core processors. Fine-grained synchronization can be achieved using the full/empty tagged shared memory. We define the complete set of synchronizing memory instructions as well as the architecture of the full/empty tagged shared memory that provides support for these operations. We develop a snoopy cache coherency protocol for an SMP with the centralized full/empty tagged memory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, et al.: The MIT Alewife machine: architecture and performance. In: ISCA 1995. Proceedings of the 22nd Annual International Symposium on Computer Architecture, Margherita Ligure, Italy, pp. 2–13. ACM Press, New York (1995)
Alverson,, et al.: The Tera computer system. In: ICS 1990. Proceedings of the 4th International Conference on Supercomputing, Amsterdam, The Netherlands, pp. 1–6. ACM Press, New York (1990)
Ang, B., Arvind, Chiou, D.: StarT the Next Generation: Integrating global caches and dataflow architecture. In: Advanced Topics in Dataflow Computing and Multithreading, IEEE Press, New York (1995)
Arvind, R.N., Pingali, K.: I-structures: data structures for parallel computing. ACM Transactions on Programming Languages and Systems (TOPLAS) 11(4), 598–632 (1989)
Barth, P., Nikhil, R., Arvind.: M-structures: extending a parallel, non-strict, functional language with state. In: Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture, Cambridge, MA, U.S, pp. 538–568. Springer, Heidelberg (1991)
Chen, D.-K., Su, H.-M., Yew, P.-C.: The impact of synchronization and granularity on parallel systems. In: ISCA 1990. Proceedings of the 17th Annual International Symposium on Computer Architecture, Seattle, Washington, pp. 239–248. ACM Press, New York (1990)
Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture. Morgan Kaufmann, Seattle (1997)
Feo, J., Harper, D., Kahan, S., Konecny, P.: ELDORADO. In: CF 2005. Proceedings of the 2nd Conference on Computing Frontiers, Ischia, Italy, pp. 28–34. ACM Press, New York (2005)
Goodman, J., Vernon, M., Woest, P.: Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In: ASPLOS-III: Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 64–75. ACM Press, New York (1989)
Hammond, et al.: Transactional memory coherence and consistency. In: Proceedings of the 31st Annual International Symposium on Computer Architecture, p. 102. IEEE Computer Society, Los Alamitos (2004)
Herlihy, M., Moss, J.: Transactional memory: architectural support for lock-free data structures. In: Proceedings of the 20th Annual International Symposium on Computer Architecture, San Diego, California, pp. 289–300. ACM Press, New York (1993)
Kägi, A., Burger, D., Goodman, J.: Efficient synchronization: Let them eat QOLB. In: Proceedings of the 24th Annual International Symposium on Computer Architecture, Denver, Colorado, pp. 170–180. ACM Press, New York (1997)
Kim, N., Austin, T., Blaauw, D., Mudge, T., Flautner, K., Hu, J., Irwin, M., Kandemir, M., Narayanan, V.: Leakage current: Moore’s Law meets static power. IEEE Computer 36(12), 68–75 (2003)
Kranz, D., Lim, B.H., Agarwal, A., Yeung, D.: Low-cost support for fine-grain synchronization in multiprocessors. In: Multithreaded Computer Architecture: A Summary of the State of the Art, pp. 139–166. Kluwer Academic Publishers, Boston (1994)
Kroft, D.: Lockup-free instruction fetch/prefetch cache organization. In: ISCA 1998. 25 years of the International Symposia on Computer Architecture (selected papers), Barcelona, Spain, pp. 195–201. ACM Press, New York (1998)
Lim, B.-H., Agarwal, A.: Reactive synchronization algorithms for multiprocessors. In: ASPLOS-VI. Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, U.S, pp. 25–35. ACM Press, New York (1994)
McDonald, A., Chung, J., Carlstrom, B., Minh, C., Chafi, H., Kozyrakis, C., Olukotun, K.: Architectural semantics for practical transactional memory. ACM SIGARCH Computer Architecture News 34(2), 53–65 (2006)
Merino, O.S., Vlassov, V., Moritz, C.A.: Performance implication of fine-grained synchronization in multiprocessors. Technical Report TRITAIMITLECS R 02:02, Department of Microelectronics and Information Technology (IMIT) Royal Institute of Technology (KTH), Stockholm, Sweden (2002)
Moore, K., Bobba, J., Moravan, M., Hill, M., Wood, D.: LogTM: Log-based transactional memory. In: Proceedings of the 12th International Symposium on High-Performance Computer Architecture, pp. 254–265 (February 2006)
Olukotun, K., Nayfeh, B., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: ASPLOS-VII. Proceedings of the 7th International Conference on Architectural Support for Programming Languages and Operating Systems, Cambridge, Massachusetts, pp. 2–11. ACM Press, New York (1996)
Ronen, R., Mendelson, A., Lai, K., Lu, S.-L., Pollack, F., Shen, J.P.: Coming challenges in microarchitecture and architecture. Proceedings of the IEEE 89(3), 325–340 (2001)
Sutter, H.: The free lunch is over: A fundamental turn toward concurrency in software. Dr. Dobb’s Journal 30(3) (March 2005)
Tullsen, D., Eggers, S., Levy, H.: Simultaneous multithreading: Maximizing on-chip parallelism. In: The 22th Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 392–403. ACM Press, New York (1995)
Tullsen, D., Lo, J., Eggers, S., Levy, H.: Supporting fine-grained synchronization on a simultaneous multithreading processor. In: HPCA 1999. Proceedings of the 5th International Symposium on High Performance Computer Architecture, pp. 54–58. IEEE Computer Society, Los Alamitos (1999)
Vachharajani, N., Iyer, M., Ashok, C., Vachharajani, M., August, D., Connors, D.: Chip multi-processor scalability for single-threaded applications. SIGARCH Computer Architecture News 33(4), 44–53 (2005)
Vlassov, V., Moritz, C.A.: Efficient fine grained synchronization support using full/empty tagged shared memory and cache coherency. Technical Report TRITA-IT-R 00:04, Deptartment of Teleinformatics, Royal Institute of Technology (KTH) (December 2000)
Xiaowei, S.: Implementing global cache coherence in *T-NG. Master’s thesis, Department of Electrical Engineering and Computer Science, MIT (May 1995)
Yeung, D., Agarwal, A.: Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient. In: PPOPP 1993. Proceedings of the 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, U.S, pp. 187–192. ACM Press, New York (1993)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vlassov, V., Merino, O.S., Moritz, C.A., Popov, K. (2007). Support for Fine-Grained Synchronization in Shared-Memory Multiprocessors. In: Malyshkin, V. (eds) Parallel Computing Technologies. PaCT 2007. Lecture Notes in Computer Science, vol 4671. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73940-1_45
Download citation
DOI: https://doi.org/10.1007/978-3-540-73940-1_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73939-5
Online ISBN: 978-3-540-73940-1
eBook Packages: Computer ScienceComputer Science (R0)