The Influence of Architectural Parameters on the Performance of Parallel Logic Programming Systems
- 305 Downloads
In this work we investigate how different machine settings for a hardware Distributed Shared Memory (DSM) architecture affect the performance of parallel logic programming (PLP) systems. We use execution-driven simulation of a DASH-like multiprocessor to study the impact of the cache block size, the cache size, the network bandwidth, the write buffer size, and the coherence protocol on the performance of Andorra-I, a PLP system capable of exploiting implicit parallelism in Prolog programs. Among several other observations, we find that PLP systems favour small cache blocks regardless of the coherence protocol, while they favour large cache sizes only in the case of invalidate-based coherence. We conclude that the cache block size, the cache size, the network bandwidth, and the coherence protocol have a significant impact on the performance, while the size of the write buffer is somewhat irrelevant.
KeywordsDSM architectures performance evaluation logic programming
Unable to display preview. Download preview PDF.
- Anthony Beaumont, S. Muthu Raman, and Péter Szeredi. Flexible Scheduling of Or-Parallelism in Aurora: The Bristol Scheduler. In Aarts, E. H. L. and van Leeuwen, J. and Rem, M., editor, PARLE91: Conference on Parallel Architectures and Languages Europe, volume 2, pages 403–420. Springer Verlag, June 1991. Lecture Notes in Computer Science 506.Google Scholar
- R. Bianchini and L. I. Kontothanassis. Algorithms for categorizing multiprocessor communication under invalidate and update-based coherence protocols. In Proceedings of the 28th Annual Simulation Symposium, April 1995.Google Scholar
- J. A. Crammond. The Abstract Machine and Implementation of Parallel Parlog. Technical report, Dept. of Computing, Imperial College, London, June 1990.Google Scholar
- M. Dubois, J. Skeppstedt, L. Ricciulli, K. Ramamurthy, and P. Stenstrom. The detection and elimination of useless misses in multiprocessors. In Proceedings of the 20th ISCA, pages 88–97, May 1993.Google Scholar
- I. C. Dutra. Strategies for Scheduling And-and Or-Work in Parallel Logic Programming Systems. In Proceedings of the 1994 International Logic Programming Symposium, pages 289–304. MIT Press, 1994. Also available as technical report CSTR-94-09, from the Department of Computer Science, University of Bristol, England.Google Scholar
- I. C. Dutra. Distributing And-and Or-Work in the Andorra-I Parallel Logic Programming System. PhD thesis, University of Bristol, Department of Computer Science, February 1995. available at http://www.cos.ufrj.br/~ines.
- James R. Goodman. Using cache memory to reduce processor-memory traffic. In Proceedings of the 10th International Symposium on Computer Architecture, pages 124–131, 1983.Google Scholar
- Markus Hitz and Erich Kaltofen, editors. Proceedings of the Second International Symposium on Parallel Symbolic Computation, PASCO’97, July 1997.Google Scholar
- D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. Proceedings of the 17th ISCA, pages 148–159, May 1990.Google Scholar
- E. M. McCreight. The Dragon Computer System, an Early Overview. In NATO Advanced Study Institute on Microarchitecture of VLSI Computers, July 1984.Google Scholar
- Johan Montelius and Seif Haridi. An evaluation of Penny: a system for fine grain implicit parallelism. In Proceedings of 2nd International Symposium on Parallel Symbolic Computation8, July 1997.Google Scholar
- S. Raina, D. H. D. Warren, and J. Cownie. Parallel Prolog on a Scalable Multiprocessor. In Peter Kacsuk and Michael J. Wise, editors, Implementations of Distributed Prolog, pages 27–44. Wiley, 1992.Google Scholar
- V. Santos Costa, Bianchini, and I. C. Dutra. Parallel Logic Programming Systems on Scalable Multiprocessors. In Proceedings of the 2nd International Symposium on Parallel Symbolic Computation, PASCO’97 , pages 58–67, July 1997.Google Scholar
- V. Santos Costa, R. Bianchini, and I. C. Dutra. Evaluating the impact of coherence protocols on parallel logic programming systems. In Proceedings of the 5th EUROMICRO Workshop on Parallel and Distributed Processing, pages 376–381, 1997. Also available as technical report ES-389/96, COPPE/Systems Engineering, May, 1996.Google Scholar
- V. Santos Costa and Bianchini R. Optimising Parallel Logic Programming Systems for Scalable Machines. In Proceedings of the EUROPAR’98, Sep 1998.Google Scholar
- V. Santos Costa, D. H. D. Warren, and R. Yang. Andorra-I: A Parallel Prolog System that Transparently Exploits both And-and Or-Parallelism. In Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming, pages 83–93. ACM press, April 1991. SIGPLAN Notices vol 26(7), July 1991.Google Scholar
- Evan Tick. Memory Performance of Prolog Architectures. Kluwer Academic Publishers, Norwell, MA 02061, 1987.Google Scholar
- J. E. Veenstra and R. J. Fowler. Mint: A front end for efficient simulation of shared-memory multiprocessors. In Proceedings of the 2nd International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’ 94), 1994.Google Scholar