Abstract
Many performance evaluation studies in computer architecture rely almost exclusively on simulation of the dynamic instruction stream from a single application. The benchmarks used are often CPU intensive and rely very little on the operating system, such as the SPEC benchmarks. However, a majority of computer systems are subject to a different class of workloads where these common practices may not accurately reflect all performance issues. For example, operating system activity and context switches are ignored because many popular simulators and tracing techniques do not support the additional complexity.
The main goal of the research is to understand the effects on the microarchitecture of operating system calls and context switches in a common computing environment. This work analyzes applications running in the ubiquitous Microsoft Windows environment using an x86 processor. Microarchitecture structures such as the instruction and data caches, TLB, and branch predictor are investigated in detail. The behavior of application and operating system code is studied to derive a complete picture of the execution behavior of these applications. In addition, a series of desktop and database applications are presented and compared with the SPEC CPU2000 suite. This analysis is conducted using a hardware tracer capable of tracing all activity including operating system calls and context switches.
We observe that the dynamic instruction stream of desktop and database applications contain 19% to 78% operating system activity whereas SPEC2000 applications typically involve less than 1% operating system activity. Not only are there more operating system calls, the average number of instructions executed on each entry into the operating system is higher for desktop and database applications. Data generated by the operating system and applications can interfere with each other. This results in more misses in the caches, more interference in the branch predictor, and worse TLB performance. We find that simulations with application code alone are not ideal for evaluating performance of microarchitecture enhancements for many programs, especially databases and desktop applications. Simulators and tracers capable of handling all system activity are essential for obtaining meaningful results for typical applications that interact with the operating system and for applications in a multiple-program environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Standard Performance Evaluation Corporation, “SPEC CPU95 Benchmark.” http://www.spec.org/osg/cpu95.
J. Casmira, D. Kaeli, and D. Hunter, “Tracing and characterization of nt-based system workloads,” in Digital Technical Journal Special Issue on Tools and Languages, pp. 6–21, Dec 1998.
J. B. Chen, Y. Endo, K. Chan, A. Diaz, M. Seltzer, and M. Smith, “The measured performance of personal computer operating systems,” in Proceedings of the 15th Symposium on Operating Systems Principles (SOSP), pp. 145299-313, Aug 1995.
D. C. Lee, P. J. Crowley, J. Baer, T. E. Anderson, and B. N. Bershad, “Execution characteristics of desktop applications on windows nt,” in Proceedings of the 25th International Symposium on Computer Architecture (ISCA), pp. 27–38, Jun 1998.
Y. Endo, Z. Wang, J. B. Chen, and M. I. Seltzer, “Using latency to evaluate interactive system performance,” in Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI), pp. 185–199, Oct 1996.
J. Casmira, J. Fraser, D. Kaeli, and W. Meleis, “Operating system impact on trace driven simulation,” in Proceedings of the 31st Simulation Symposium, pp. 76–82, Apr 1998.
N. Gloy, C. Young, J. Bradley, and M. D. Smith, “An analysis of dynamic branch prediction schemes on system workloads,” in Proceedings of the 23rd International Symposium on Computer Architecture (ISCA), pp. 12–21, May 1996.
M. Rosenblum, E. Bugnion, S. A. Herrod, E. Witchel, and A. Gupta, “The impact of architectural trends on operating system performance,” in Proceedings of the 15th Symposium on Operating Systems Principles (SOSP), pp. 285–298, Dec 1995.
M. C. Merten, A. R. Trick, C. N. George, J. C. Gyllenhaal, and W. W. Hwu, “A hardware-driven profiling scheme for identifying hot spots to support runtime optimization,” in Proceedings of the 26th International Symposium on Computer Architecture (ISCA), pp. 136–147, May 1999.
M. C. Merten, A. R. Trick, E. M. Nystrom, R. D. Barnes, and W. W. Hwu, “A hardware mechanism for dynamic extraction and re-layout of program hot spots,” in Proceedings of the 27th International Symposium on Computer Architecture (ISCA), pp. 59–70, Jun 2000.
J. L. Henning, “SPEC cpu 2000: Measuring cpu performance in the new millennium” in IEEE Computer, pp. 28 1–35, July 2000.
A. M. Maynard, C. Donnelly, and B. Olszewski, “Contrasting characteristics and cache performance of technical and multi-user commercial workloads,” in Proceedings of the 6th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 145–155, Oct 1994.
K. Keeton, D. Patterson, Y. He, R. Raphael, and W. Baker, “Performance characterization of a quad pentium pro smp using oltp workloads,” in Proceedings of the 25th Annual International Symposium on Computer Architecture (ISCA), pp. 15–26, Jun 1998.
D. Bhandarkar and J. Ding, “Performance characteristics of the pentium pro processor,” in Proceedings of the 3rd International Symposium on High Performance Computer Architecture (HPCA), pp. 288–297, Feb 1997.
D. Talla and L. John, “Execution Characteristics of Multimedia Applications on a Pentium II Processor,” in Proceedings of the International Performance Computing and Communication Conference (IPCCC), pp. 516–524, Feb 2000.
A. Agarwal, J. Hennessy, and M. Horowitz, “Cache performance of operating system and multiprogramming workloads,” ACM Transactions on Computer Systems, vol. 6, pp. 393–431, Nov 1988.
J. C. Mogul and A. Borg, “The effect of context switches on cache performance,” Tech. Rep. TN-16, Digital Western Research Lab, Palo Alto, CA, USA, Dec 1990.
M. Evers, P. Chang, and Y. N. Patt, “Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches,” in Proceedings of the 23rd International Symposium on Computer Architecture (ISCA), pp. 3–11, May 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer Science+Business Media New York
About this chapter
Cite this chapter
Bhargava, R., Rubio, J., Kannan, S., John, L.K., Christie, D., Klaes, L. (2001). Understanding the Impact of X86/NT Computing on Microarchitecture. In: John, L.K., Maynard, A.M.G. (eds) Workload Characterization of Emerging Computer Applications. The Springer International Series in Engineering and Computer Science, vol 610. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1613-2_10
Download citation
DOI: https://doi.org/10.1007/978-1-4615-1613-2_10
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5641-7
Online ISBN: 978-1-4615-1613-2
eBook Packages: Springer Book Archive