Capacity and Performance Engineering for Networked Application Servers: A Case Study in E-mail Platform Planning

Reeser, Paul

doi:10.1007/978-1-84882-828-5_16

Paul Reeser⁴

Part of the book series: Computer Communications and Networks ((CCN))

838 Accesses

Abstract

Proper capacity/performance engineering is critical to the success of developing and deploying any complex networked application. In this chapter, we discuss the typical capacity, performance, reliability, and scalability engineering activities required to deploy a networked service platform. These activities begin at the earliest stages, and span the entire platform life cycle: from architecture, design, and development, through service test and deployment, to ongoing capacity management. The goal of this chapter is not to present an exhaustive “how to” manual, but rather to highlight areas where proper capacity/performance engineering is especially critical to success. We use an ISP email platform as a unifying case study to illustrate many of these tasks. This chapter covers the following topics: Architecture Assessment – elements, transactions, flows, and bottlenecks Workload Assessment – workload, requirements, budgeting, and estimation Availability/Reliability Assessment – modeling and failure-mode analysis Capacity/Performance Assessment – measurement, modeling, and overload Scalability Assessment – demand projections, modeling, and engineering rules Capacity/Performance Management – monitoring, growth, and automation Capacity/Performance Engineering – “best practice” principles

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The term “capacity/performance engineering” in the chapter title and throughout this chapter broadly refers to the expansive set of activities required to assess and manage platform capacity, performance, availability, reliability, and scalability.
2.
This Markovian property results from the memoryless nature of the exponential distribution, and is referred to as Poisson Arrivals See Time Averages (PASTA).
3.
The coefficient of variation (CV) is a normalized measure of dispersion of a distribution, defined as the ratio of the standard deviation σ to the mean μ (CV = σ ∕ μ).
4.
In reality, ISPs typically support multiple applications in addition to e-mail (e.g., newsgroups and web hosting). These applications typically share physical resources, either through virtualization, common transactions (e.g., authentication), or shared infrastructure (e.g., LANs). For the purpose of illustrating the C/PE tasks, we assume that all physical resources are dedicated to the single e-mail application. In the case of resource sharing/virtualization, the C/PE analysis must account for the impact of additional workload, reduced resource availability, and contention.
5.
This expression results from a BoE model for delay W reviewed in Section 16.2.
6.
As discussed in Section 16.2, both analytic modeling and practical experience suggest that the average delay for user-initiated jobs with common code execution is typically one-third to half of 95th percentile delay. As part of the budgeting exercise, we can perform sensitivity analyses around this 95th percentile-to-mean assumption.

Abbreviations

ACL:: access control list
AS/V:: anti-spam/virus filtering server
BH:: busy hour
B5M:: busy 5 min.
BoE:: back-of-the-envelope
C/PE:: capacity/performance engineering
DMoQ:: direct measure of quality
DPM:: defect per million
DSL:: digital subscriber line
DT:: downtime
FIFO:: first-in-first-out
FIT:: fault insertion testing
FMEA:: failure modes and effects analysis
FTP:: File Transfer Protocol
FTTH:: fiber-to-the-home
GW:: IB SMTP Gateway server
HT:: headroom threshold
HTTP:: Hyper-Text Transfer Protocol
HTTPS:: Secure HTTP
HW:: hardware
IMAP:: Internet Message Access Protocol
IB:: inbound
i.i.d.:: independent identically distributed
I/O:: input/output
ISP:: Internet service provider
LAN:: local area network
LIFO:: last-in-first-out
MIB:: management information base
MR:: OB Mail Relay server
MRA:: modification request analysis
MTTF:: mean-time-to-failure
MTTR:: mean-time-to-restore
NAS:: network attached storage
NFS:: network file system
OB:: outbound
PO:: Post Office server
POP:: Post Office Protocol
PP:: POP Proxy server
PS:: processor-sharing
RBD:: reliability block diagram
SAN:: storage area network
SLA:: service-level agreement
SLO:: service-level objective
SNMP:: Simple Network Management Protocol
SPoF:: single point of failure
SRE:: software reliability engineering
SMTP:: Simple Mail Transfer Protocol
tps:: transactions per second
VIP:: virtual IP address (aka VLAN)
WM:: WebMail server

References

Smith, C., & Williams, L. (2002). Performance solutions – a practical guide to creating responsive, scalable software. Reading, MA: Addison-Wesley.
Google Scholar
Chrissis, M., Konrad, M., & Shrum, S. (2003). CMMI: Guidelines for process integration and product improvement. Reading, MA: Addison-Wesley.
Google Scholar
Jain, R. (1991). The art of computer systems performance analysis: Techniques for experimental design, measurement, simulation, and modeling. New York: Wiley-Interactive.
MATH Google Scholar
Menasce, D., Almeida, V., & Dowdy, L. (2004). Performance by design – computer capacity planning by example. Upper Saddle River, NJ: Prentice Hall PTR.
Google Scholar
Ross, S. (1972). Introduction to probability models. New York: Academic.
MATH Google Scholar
Cooper, R. (1981). Introduction to queueing theory (2nd ed.). New York: North Holland.
Google Scholar
Lazowska, E., Zahorjan, J., Graham, G., & Sevcik, K. (1984). Quantitative system performance – computer system analysis using queueing network models. Upper Saddle River, NJ: Prentice-Hall.
Google Scholar
Kleinrock, L. (1975). Queueing systems, volume 1: theory. New York: Wiley-Interscience.
Google Scholar
Little, J. (1961). A proof of the queueing formula L = λ W. Operations Research 9, 383–387.
Article MathSciNet MATH Google Scholar
Hennessy, J., & Patterson, D. (2007). Computer architecture: a quantitative approach (4th ed.). Boston, MA: Elsevier-Morgan Kaufman.
Google Scholar
Snee, R. (1990). Statistical thinking and its contribution to total quality. American Statistician, 44(2), 116–121.
Article Google Scholar
Smith, C. (1990). Performance engineering of software systems. Reading, MA: Addison-Wesley.
Google Scholar
Musa, J. (1999). Software reliability engineering. New York: McGraw-Hill.
Google Scholar
Billington, R., & Allan, R. (1992). Reliability evaluation of engineering systems (2nd ed.). New York: Plenum.
Google Scholar
Reeser, P. (1996). Predicting system reliability in a client/server application hosting environment. Proceedings, Joint AT&T/Lucent Reliability Info Forum.
Google Scholar
Huebner, F., Meier-Hellstern, K., & Reeser, P. (2001). Performance testing for IP services and systems. In Dumke, R, Rautenstrauch, C., Schmietendorf, A., & Scholz, A. (Eds.), Performance engineering – state of the art and current trends. Heidelberg: Springer-Verlag.
Google Scholar

Download references

Author information

Authors and Affiliations

AT&T Labs Research, 200 S. Laurel Avenue, D5-3D26, Middletown, NJ, 07748, USA
Paul Reeser (Lead Member of Technical Staff)

Authors

Paul Reeser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul Reeser .

Editor information

Editors and Affiliations

AT & T Labs Research, Park Ave. 180, Florham Park, 07932, USA
Charles R. Kalmanek
, School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
Sudip Misra
Dept. Computer Science, Yale University, Prospect St. 51, New Haven, 06511, USA
Yang (Richard) Yang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Reeser, P. (2010). Capacity and Performance Engineering for Networked Application Servers: A Case Study in E-mail Platform Planning. In: Kalmanek, C., Misra, S., Yang, Y. (eds) Guide to Reliable Internet Services and Applications. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-84882-828-5_16

Download citation

DOI: https://doi.org/10.1007/978-1-84882-828-5_16
Published: 25 January 2010
Publisher Name: Springer, London
Print ISBN: 978-1-84882-827-8
Online ISBN: 978-1-84882-828-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics