Experiences Understanding Performance in a Commercial Scale-Out Environment

  • Robert W. Wisniewski
  • Reza Azimi
  • Mathieu Desnoyers
  • Maged M. Michael
  • Jose Moreira
  • Doron Shiloach
  • Livio Soares
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)


Clusters of loosely connected machines are becoming an important model for commercial computing. The cost/performance ratio makes these scale-out solutions an attractive platform for a class of computational needs. The work we describe in this paper focuses on understanding performance when using a scale-out environment to run commercial workloads. We describe the novel scale-out environment we configured and the workload we ran on it. We explain the unique performance challenges faced in such an environment and the tools we applied and improved for this environment to address the challenges. We present data from the tools that proved useful in optimizing performance on our system. We discuss the lessons we learned applying and modifying existing tools to a commercial scale-out environment, and offer insights into making future performance tools effective in this environment.


System Call Memory Hierarchy Page Cache Hardware Event Hardware Performance Counter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Top 500 supercomputer sites,
  2. 2.
    Lttng home web page,
  3. 3.
    Azimi, R., Stumm, M., Wisniewski, R.W.: Online performance analysis by statistical sampling of microprocessor performance counters. In: ICS International Conference on Supercomputing, Cambridge, Massachusetts (June 2005)Google Scholar
  4. 4.
    Cafarella, M., Cutting, D.: Building Nutch: Open source search. j-QUEUE 2(2), 54–61 (2004)CrossRefGoogle Scholar
  5. 5.
    The lucene home page,
  6. 6.
    Barroso, L.A., Dean, J., Hölzle, U.: Web search for a planet: The Google Cluster Architecture. IEEE Micro 23(2), 22–28 (2003)CrossRefGoogle Scholar
  7. 7.
    Ammons, G., Appavoo, J., Butrico, M., Silva, D.D., Grove, D., Kawachiya, K., Krieger, O., Rosenburg, B., Hensbergen, E.V., Wisniewski, R.W.: Libra: A library operating system for a jvm in a virtualized execution environment. In: VEE (Virtual Execution Environments), June 13-15,2007, San Diego, CA, 2007Google Scholar
  8. 8.
    Yaghmour, K., Dagenais, M.R.: The linux trace toolkit. Linux Journal  (May 2000)Google Scholar
  9. 9.
    Desnoyers, M., Dagenais, M.R.: The lttng tracer: A low impact performance and behavior monitor for gnu/linux. In: OLS(Ottawa Linux Symposium) (July 2006)Google Scholar
  10. 10.
    The K42 operating system,
  11. 11.
    Wisniewski, R.W., Rosenburg, B.: Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. In: Supercomputing, November 17-21, 2003, Phoenix Arizona (2003)Google Scholar
  12. 12.
    Dongarra, J., London, K., Moore, S., Mucci, P., Terpstra, D., You, H., Zhou, M.: Experiences and lessons learned with a portable interface to hardware performance counters. In: Proceedings of Workshop Parallel and Distributed Systems: Testing and Debugging (PATDAT), joint with the 19th Intl. Parallel and Distributed Processing Symposium (IPDPS), Niece, France (April 2003)Google Scholar
  13. 13.
    Mathur, W., Cook, J.: Improved estimation for software multiplexing of performance counters. In: Proceedings of the 13th Intl. Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Atlanta, GA (September 2005)Google Scholar
  14. 14.
    May, J.M.: MPX: Software for multiplexing hardware performance counters in multithreaded systems. In: Proceedings of the Intl. Parallel and Distributed Processing Symposium (IPDPS), San Francisco, CA (April 2001)Google Scholar
  15. 15.
    Anderson, J., Berc, L., Dean, J., Ghemawat, S., Henzinger, M., Leung, S., Sites, D., Vandervoorde, M., Waldspurger, C., Weihl, W.: Continuous profiling: Where have all the cycles gone? In: Proceedings of the 16th ACM Symposium of Operating Systems Principles (SOSP), Saint-Malo, France (October 1997)Google Scholar
  16. 16.
    Wassermann, H.J., Lubeck, O.M., Luo, Y., Bassetti, F.: Performance evaluation of the SGI Origin2000: a memory-centric characterization of lanl asci applications. In: Proceedings of ACM/IEEE Conference on Supercomputing (SC), San Jose, CA (November 1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Robert W. Wisniewski
    • 1
  • Reza Azimi
    • 2
  • Mathieu Desnoyers
    • 3
  • Maged M. Michael
    • 1
  • Jose Moreira
    • 1
  • Doron Shiloach
    • 1
  • Livio Soares
    • 2
  1. 1.IBM T.J. Watson Research Center 
  2. 2.University of Toronto 
  3. 3.École Polytechnique de Montréal 

Personalised recommendations