Journal of Computer Science and Technology

, Volume 16, Issue 3, pp 231–241 | Cite as

Dynamic data prefetching in home-based software DSMs

  • Hu Weiwu 
  • Zhang Fuxin 
  • Liu Haiming 


A major overhead in software DSM (Distributed Shared Memory) is the cost of remote memory accesses necessitated by the protocol as well as induced by false sharing. This paper introduces a dynamic prefetching method implemented in the JIAJIA software DSM to reduce system overhead caused by remote accesses. The prefetching method records the interleaving string of INV (invalidation) and GETP (getting a remote page) operations for each cached page and analyzes the periodicity of the string when a page is invalidated on a lock or barrier. A prefetching request is issued after the lock or barrier if the periodicity analysis indicates that GETP will be the next operation in the string. Multiple prefetching requests are merged into the same message if they are to the same host. Performance evaluation with eight well-accepted benchmarks in a cluster of sixteen Power PC workstations shows that the prefetching scheme can significantly reduce the page fault overhead and as a result achieves a performance increase of 15%–20% in three benchmarks and around 8%–10% in another three. The average extra traffic caused by useless prefetches is only 7%–13% in the evaluation.


software DSM remote access prefetching performance evaluation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Carter J, Bennet J, Zwaenepoel W. Implementation and performance of Munin. InProc. the 13th Symp. Operating Systems Principles, Oct., 1991, pp.152–164.Google Scholar
  2. [2]
    Keleher P, Dwarkadas S, Cox A, Zwaenepoel W. TreadMarks distributed shared memory on standard workstations and operating systems. InProc. the 1994 Winter Usenix Conf., Jan., 1994, pp.115–131.Google Scholar
  3. [3]
    Hu Weiwu, Shi Weisong, Tang Zhimin. Optimizing home-based software DSM protocols.Cluster Computing, to appear in 2001.Google Scholar
  4. [4]
    Hu Weiwu, Shi Weisong, Tang Zhimin, Li Ming. A lock-based cache coherence protocol for scope consistency.Journal of Computer Science and Technology, Mar., 1998, 13(2): 97–109.CrossRefGoogle Scholar
  5. [5]
    Woo S, Ohara M, Torrie Eet al. The SPLASH-2 programs: Characterization and methodological considerations. InProc. ISCA’95, 1995, pp.24–36.Google Scholar
  6. [6]
    Bailey D, Barton J, Lasinski T, Simon H. The NAS parallel benchmarks. Technical Report No. 103863, NASA, Jul., 1993.Google Scholar
  7. [7]
    Lu H, Dwarkadas S, Cox A, Zwaenepoel W. Quantifying the performance differences between PVM and TreadMarks.Journal of Parallel and Distributed Computing, Jun., 1997, 43(2): 65–78.CrossRefGoogle Scholar
  8. [8]
    Iftode L. Home-based shared virtual memory [dissertation]. Princeton University, Aug., 1998.Google Scholar
  9. [9]
    Hu Weiwu, Shi Weisong, Tang Zhimin. Reducing system overhead in home-based software DSMs. InProc. the 13th Int. Parallel Processing Symp., Apr, 1999, pp.167–173.Google Scholar
  10. [10]
    Hu Weiwu, Zhang Fuxin, Liu Haiming. A new home-based software DSM protocol for SMP clusters. InProc. the 6th Euro-Par Conference, Aug., 2000, pp.1132–1142.Google Scholar
  11. [11]
    Karlsson M, Stenstrom P. Effectiveness of dynamic prefetching in multiple-writer distributed virtual shared memory system.Journal of Parallel and Distributed Computing, Jun., 1997, 43(2): 79–93.CrossRefGoogle Scholar
  12. [12]
    Bianchini R, Kontothanasis L, Pinto Ret al. Hiding communication latency and coherence overhead in software DSMs. InProc. 7th Int. Conf. Architectural Support for Programming Languages and Operating Systems, 1996, pp.198–209.Google Scholar
  13. [13]
    Mowry T, Gupta A. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors.Journal of Parallel and Distributed Computing, Jun., 1991, 12(2): 87–106.CrossRefGoogle Scholar
  14. [14]
    Dwarkadas S, Lu H, Cox Aet al. Combining compile-time and runtime support for efficient software distributed shared memory. InProc. IEEE, Special Issue on Distributed Shared Memory, Mar., 1999, pp.476–486.Google Scholar
  15. [15]
    Keleher P, Tseng C. Enhancing software DSM for compiler-parallelized applications. InProc. the 11th Int. Parallel Processing Symposium, Apr., 1997.Google Scholar
  16. [16]
    Chandra S, Larus J. Optimizimg communication in HPF programs for fine-grained distributed shared memory. InProc. the 6th Symp. Principles and Practice of Parallel Programming, Jun., 1997.Google Scholar
  17. [17]
    Amza C, Cox A, Dwarkadas Set al. Adaptive protocols for software distributed shared memory. InProc. IEEE, Special Issue on Distributed Shared Memory, Mar., 1999, pp.467–475.Google Scholar
  18. [18]
    Bershad B, Zekauskas M, Sawdon W. The Midway Distributed Shared Memory System. InProc. the 38th IEEE Int., CompCon Conf., Feb., 1993, pp.528–537.Google Scholar
  19. [19]
    Dwarkadas S, Schaffer A, Cottingham Ret al. Parallelization of general linkage analysis problemsHuman Heredity, 1994, 44: 127–141.CrossRefGoogle Scholar
  20. [20]
    Lathtop G, Lalouel J, Jurier C, Ott J. Strategies for multilocus analysis in humans.PNAS, 1994, 81: 3443–3446.CrossRefGoogle Scholar
  21. [21]
    Li K. IVY: A shared virtual memory system for parallel computing. InProc. the 1988 Int. Conf. Parallel Processing, Aug., 1988, 2: 94–101.Google Scholar
  22. [22]
    Schaffer A, Gupta S, Shriram K, Cottingham R. Avoiding recompoudation in genetic linkage analysis.Human Heredity, 1994, 44: 225–237.CrossRefGoogle Scholar

Copyright information

© Science Press, Beijing China and Allerton Press Inc. 2001

Authors and Affiliations

  1. 1.Institute of Computing TechnologyThe Chinese Academy of SciencesBeijingP.R. China

Personalised recommendations