Boosting the Performance of Three-Tier Web Servers Deploying SMP Architecture

  • Pierfrancesco Foglia
  • Roberto Giorgi
  • Cosimo Antonio Prete
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2376)


The focus of this paper is on analyzing the effectiveness of SMP (Symmetric Multi-Processor) architecture for implementing Three-Tier Web-Servers. In particular, we considered a workload based on the TPC-W benchmark to evaluate the system.

As the major bottleneck of this system is accessing memory through the shared bus, we analyzed what are the benefits of adopting several solutions aimed at boosting the global performance of the Web Server. Our aim is also to quantify the scalability of such a system and suggest solutions to achieve the desired processing power. The analysis starts from a reference case, and explores different architectural choices as for cache, scheduling algorithm, and coherence protocol in order to increase the number of processors possibly connected through the shared bus.

Our results show that such an SMP based server could be scaled (up to 20 processor) quite above the limits expected for this kind of architecture, if particular attention is used in solving problems related to process migration and coherence overhead.


Multiprocessor Shared-Memory Coherence Protocol Performance Evaluation Process Migration 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    S.V. Adve and K. Gharachorloo: Shared Memory Consistency Models: A Tutorial. IEEE Computer, pp. 66–76, December 1996.Google Scholar
  2. 2.
    A. Agarwal and A. Gupta: Memory Reference Characteristics of Multiprocessor Applications under Mach. Proc. ACM Sigmetrics, Santa Fe, NM, pp. 215–225, May 1988.Google Scholar
  3. 3.
    J.K. Archibald and J. L. Baer: Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model. ACM Transactions On Computer Systems, vol. 4, pp. 273–298, April 1986.Google Scholar
  4. 4.
    L.A. Barroso, K. Gharachorloo, and E. Bugnion: Memory System Characterization of Commercial Workloads. Proc. 25th Int. Sympo. on Computer Architecture, Barcelona, Spain, pp. 3–14, June 1998.Google Scholar
  5. 5.
    T. Cain, R. Rajwar, M. Marden, and M. Lipasti: An Architectural Characterization of Java TPC-W. 7th International Symposium of High-Performance Computer Architecture, pp. 229–240, January 2001.Google Scholar
  6. 6.
    Q. Cao, J. Torrellas, et al.: Detailed characterization of a quad Pentium Pro server running TPC-D. International Conference on Computer Design, pp.108–115, October 1999.Google Scholar
  7. 7.
    J. Chapin, et al.: Memory System Performance of UNIX on CC-NUMA Multiprocessors. ACM Sigmetrics Conf. on Measurement and Modeling of Computer Systems, pp. 1–13, May 1995.Google Scholar
  8. 8.
    A. L. Cox and R.J. Fowler: Adaptive Cache Coherency for Detecting Migratory Shared Data. Proc. of 20th International Symposium on Computer Architecture, San Diego, CA, pp. 98–108, May 1993.Google Scholar
  9. 9.
    J. Edwards: The changing Face of Freeware. IEEE Computer, vol. 31, no. 10, pp. 11–13, October 1998.Google Scholar
  10. 10.
    J. Edwards: 3-Tier Client/Server At Work. Wiley Computer Publishing, New York, NY, 1999.Google Scholar
  11. 11.
    P. Foglia: An Algorithm for the Classification of Coherence Related Overhead in Shared-Bus Shared-Memory Multiprocessors. IEEE TCCA Newsletter, pp. 53–58, January 2001.Google Scholar
  12. 12.
    R. Giorgi, C.A. Prete et al.: Trace Factory: a Workload Generation Environment for Trace-Driven Simulation of Shared-Bus Multiprocessor. IEEE Concurrency, vol. 5, no. 4, pp. 54–68, October 1997.CrossRefGoogle Scholar
  13. 13.
    R. Giorgi and C.A. Prete: PSCR: A Coherence Protocol for Eliminating Passive Sharing in Shared-Bus Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, pp. 742–763, vol. 10, no. 7, July 1999.CrossRefGoogle Scholar
  14. 14.
    GNU Free Software Foundation.
  15. 15.
    S.R. Goldschmidt and J.L. Hennessy: The Accuracy of Trace-Driven Simulations of Multiprocessors. Sigmetrics Conf. on Measurement and Modeling of Computer Systems, CA, pp. 146–157, May 1993.Google Scholar
  16. 16.
    A. M. Griffazzi Maynard et al.: Contrasting characteristics and cache performance of technical and multi-user commercial workloads. Proc. of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 158–170, October 1994.Google Scholar
  17. 17.
    J. Hennessy and D.A. Patterson: Computer Architecture: a Quantitative Approach, 2nd edition. Morgan Kaufmann Publishers, San Francisco, CA, 1996.zbMATHGoogle Scholar
  18. 18.
    R.L. Hyde and B.D. Fleisch: An Analysis of Degenerate Sharing and False Coherence. Journal of Parallel and Distributed Computing, vol. 34, no. 2, pp. 183–195, May 1996.CrossRefGoogle Scholar
  19. 19.
    K. Keeton, D. Patterson et al.: Performance characterization of a quad Pentium Pro SMP using OLTP workloads. Proc. of the 25th International Symposium on Computer Architecture, pp. 15–26, June 1998.Google Scholar
  20. 20.
    Linux on SGI/MIPS,
  21. 21.
    V. Milutinovic: Infrastructure for Electronic Business on the Internet. Kluwer Publishers, 2001.Google Scholar
  22. 22.
    C.A. Prete: RST Cache Memory Design for a Tightly Coupled Multiprocessor System. IEEE Micro, vol. 11, no. 2, pp. 16–19, 40–52, April 1991.CrossRefGoogle Scholar
  23. 23.
    C.A. Prete, G. Prina, R. Giorgi, and L. Ricciardi: Some Considerations About Passive Sharing in Shared-Memory Multiprocessors. IEEE TCCA Newsletter, pp. 34–40, March 1997.Google Scholar
  24. 24.
    D. Robinson: APACHE-An HTTP Server. Reference Manual, 1995,
  25. 25.
    T. Shanley and Mindshare Inc.; Pentium Pro and Pentium II System Architecture, 2nd edition. Addison Wesley, Reading, MA, 1999.Google Scholar
  26. 26.
    R. Short, R. Gamache, et al.: Windows NT Clusters for Availability and Scalability. In Proceedings of the 42nd IEEE International Computer Conference, pp. 8–13, San Jose, CA February 1997.Google Scholar
  27. 27.
    M.S. Squillante and D.E. Lazowska: Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling. IEEE Transactions on Parallel and Distributed Systems, vol. 4, no. 2, pp. 131–143, February 1993.CrossRefGoogle Scholar
  28. 28.
    P. Stenström, M. Brorsson, and L. Sandberg: An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing. 20th Int. Symposium on Computer Architecture, San Diego, CA, May 1993.Google Scholar
  29. 29.
    P. Stenström, E. Hagersten, D.J. Li, M. Martonosi, and M. Venugopal. Trends in Shared Memory Multiprocessing. IEEE Computer, vol. 30, no. 12 pp. 44–50, December 1997.Google Scholar
  30. 30.
    C.B. Stunkel, B. Janssens, and W.K. Fuchs: Address Tracing for Parallel Machines. IEEE Computer, vol. 24, no. 1, pp. 31–45, January 1991.Google Scholar
  31. 31.
    P. Sweazey and A. J. Smith: A Class of Compatible Cache Consistency Protocols and Their Support by the IEEE Futurebus. Proc. of the 13th Intnl. Symph, on Computer Architecture, pp. 414–423, June 1986.Google Scholar
  32. 32.
    M. Tomasevic and V. Milutinovic The Cache Coherence Problem in Shared-Memory Multiprocessors-Hardware Solutions. IEEE Computer Society Press, Los Alamitos, CA, April 1993.Google Scholar
  33. 33.
    J. Torrellas, M.S. Lam, and J.L. Hennessy: False Sharing and Spatial Locality in Multiprocessor Caches. IEEE Transactions on Computer, vol. 43, no. 6, pp. 651–663, June 1994.zbMATHCrossRefGoogle Scholar
  34. 34.
    J. Torrellas et al.: Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors. Journal of Parallel and Distributed Computing, vol. 24, no. 2, pp. 139–151, Feb. 1995.CrossRefGoogle Scholar
  35. 35.
    TPC BENCHMARK W (Web Commerce) Specification, version 1.0.1. Transaction Processing Performance Council, February 2000.Google Scholar
  36. 36.
    P. Trancoso, et. al.: Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors. 3rd Int. Symp. on High Perf. Computer Architecture, pp. 250–260, February 1997.Google Scholar
  37. 37.
    R.A. Uhlig and T.N. Mudge: Trace-Driven memory simulation: a survey. ACM Computing Surveys, pp. 128–170, June 1997.Google Scholar
  38. 38.
    A. Yu and J. Chen: The POSTGRES95 User Manual. Computer Science Div., Dept. of EECS, University of California at Berkeley, July 1995.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Pierfrancesco Foglia
    • 1
  • Roberto Giorgi
    • 2
  • Cosimo Antonio Prete
    • 1
  1. 1.Dipartimento di Ingegneria dell’InformazioneUniversita’ di PisaPisaItaly
  2. 2.Dipartimento di Ingegneria dell’InformazioneUniversita’ di SienaSienaItaly

Personalised recommendations