Skip to main content

Capacity and Performance Engineering for Networked Application Servers: A Case Study in E-mail Platform Planning

  • Chapter
  • First Online:
Book cover Guide to Reliable Internet Services and Applications

Part of the book series: Computer Communications and Networks ((CCN))

  • 838 Accesses

Abstract

Proper capacity/performance engineering is critical to the success of developing and deploying any complex networked application. In this chapter, we discuss the typical capacity, performance, reliability, and scalability engineering activities required to deploy a networked service platform. These activities begin at the earliest stages, and span the entire platform life cycle: from architecture, design, and development, through service test and deployment, to ongoing capacity management. The goal of this chapter is not to present an exhaustive “how to” manual, but rather to highlight areas where proper capacity/performance engineering is especially critical to success. We use an ISP email platform as a unifying case study to illustrate many of these tasks. This chapter covers the following topics: Architecture Assessment – elements, transactions, flows, and bottlenecks Workload Assessment – workload, requirements, budgeting, and estimation Availability/Reliability Assessment – modeling and failure-mode analysis Capacity/Performance Assessment – measurement, modeling, and overload Scalability Assessment – demand projections, modeling, and engineering rules Capacity/Performance Management – monitoring, growth, and automation Capacity/Performance Engineering – “best practice” principles

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term “capacity/performance engineering” in the chapter title and throughout this chapter broadly refers to the expansive set of activities required to assess and manage platform capacity, performance, availability, reliability, and scalability.

  2. 2.

    This Markovian property results from the memoryless nature of the exponential distribution, and is referred to as Poisson Arrivals See Time Averages (PASTA).

  3. 3.

    The coefficient of variation (CV) is a normalized measure of dispersion of a distribution, defined as the ratio of the standard deviation σ to the mean μ (CV = σ ∕ μ).

  4. 4.

    In reality, ISPs typically support multiple applications in addition to e-mail (e.g., newsgroups and web hosting). These applications typically share physical resources, either through virtualization, common transactions (e.g., authentication), or shared infrastructure (e.g., LANs). For the purpose of illustrating the C/PE tasks, we assume that all physical resources are dedicated to the single e-mail application. In the case of resource sharing/virtualization, the C/PE analysis must account for the impact of additional workload, reduced resource availability, and contention.

  5. 5.

    This expression results from a BoE model for delay W reviewed in Section 16.2.

  6. 6.

    As discussed in Section 16.2, both analytic modeling and practical experience suggest that the average delay for user-initiated jobs with common code execution is typically one-third to half of 95th percentile delay. As part of the budgeting exercise, we can perform sensitivity analyses around this 95th percentile-to-mean assumption.

Abbreviations

ACL:

access control list

AS/V:

anti-spam/virus filtering server

BH:

busy hour

B5M:

busy 5 min.

BoE:

back-of-the-envelope

C/PE:

capacity/performance engineering

DMoQ:

direct measure of quality

DPM:

defect per million

DSL:

digital subscriber line

DT:

downtime

FIFO:

first-in-first-out

FIT:

fault insertion testing

FMEA:

failure modes and effects analysis

FTP:

File Transfer Protocol

FTTH:

fiber-to-the-home

GW:

IB SMTP Gateway server

HT:

headroom threshold

HTTP:

Hyper-Text Transfer Protocol

HTTPS:

Secure HTTP

HW:

hardware

IMAP:

Internet Message Access Protocol

IB:

inbound

i.i.d.:

independent identically distributed

I/O:

input/output

ISP:

Internet service provider

LAN:

local area network

LIFO:

last-in-first-out

MIB:

management information base

MR:

OB Mail Relay server

MRA:

modification request analysis

MTTF:

mean-time-to-failure

MTTR:

mean-time-to-restore

NAS:

network attached storage

NFS:

network file system

OB:

outbound

PO:

Post Office server

POP:

Post Office Protocol

PP:

POP Proxy server

PS:

processor-sharing

RBD:

reliability block diagram

SAN:

storage area network

SLA:

service-level agreement

SLO:

service-level objective

SNMP:

Simple Network Management Protocol

SPoF:

single point of failure

SRE:

software reliability engineering

SMTP:

Simple Mail Transfer Protocol

tps:

transactions per second

VIP:

virtual IP address (aka VLAN)

WM:

WebMail server

References

  1. Smith, C., & Williams, L. (2002). Performance solutions – a practical guide to creating responsive, scalable software. Reading, MA: Addison-Wesley.

    Google Scholar 

  2. Chrissis, M., Konrad, M., & Shrum, S. (2003). CMMI: Guidelines for process integration and product improvement. Reading, MA: Addison-Wesley.

    Google Scholar 

  3. Jain, R. (1991). The art of computer systems performance analysis: Techniques for experimental design, measurement, simulation, and modeling. New York: Wiley-Interactive.

    MATH  Google Scholar 

  4. Menasce, D., Almeida, V., & Dowdy, L. (2004). Performance by design – computer capacity planning by example. Upper Saddle River, NJ: Prentice Hall PTR.

    Google Scholar 

  5. Ross, S. (1972). Introduction to probability models. New York: Academic.

    MATH  Google Scholar 

  6. Cooper, R. (1981). Introduction to queueing theory (2nd ed.). New York: North Holland.

    Google Scholar 

  7. Lazowska, E., Zahorjan, J., Graham, G., & Sevcik, K. (1984). Quantitative system performance – computer system analysis using queueing network models. Upper Saddle River, NJ: Prentice-Hall.

    Google Scholar 

  8. Kleinrock, L. (1975). Queueing systems, volume 1: theory. New York: Wiley-Interscience.

    Google Scholar 

  9. Little, J. (1961). A proof of the queueing formula L = λ W. Operations Research 9, 383–387.

    Article  MathSciNet  MATH  Google Scholar 

  10. Hennessy, J., & Patterson, D. (2007). Computer architecture: a quantitative approach (4th ed.). Boston, MA: Elsevier-Morgan Kaufman.

    Google Scholar 

  11. Snee, R. (1990). Statistical thinking and its contribution to total quality. American Statistician, 44(2), 116–121.

    Article  Google Scholar 

  12. Smith, C. (1990). Performance engineering of software systems. Reading, MA: Addison-Wesley.

    Google Scholar 

  13. Musa, J. (1999). Software reliability engineering. New York: McGraw-Hill.

    Google Scholar 

  14. Billington, R., & Allan, R. (1992). Reliability evaluation of engineering systems (2nd ed.). New York: Plenum.

    Google Scholar 

  15. Reeser, P. (1996). Predicting system reliability in a client/server application hosting environment. Proceedings, Joint AT&T/Lucent Reliability Info Forum.

    Google Scholar 

  16. Huebner, F., Meier-Hellstern, K., & Reeser, P. (2001). Performance testing for IP services and systems. In Dumke, R, Rautenstrauch, C., Schmietendorf, A., & Scholz, A. (Eds.), Performance engineering – state of the art and current trends. Heidelberg: Springer-Verlag.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul Reeser .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London

About this chapter

Cite this chapter

Reeser, P. (2010). Capacity and Performance Engineering for Networked Application Servers: A Case Study in E-mail Platform Planning. In: Kalmanek, C., Misra, S., Yang, Y. (eds) Guide to Reliable Internet Services and Applications. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-84882-828-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-84882-828-5_16

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84882-827-8

  • Online ISBN: 978-1-84882-828-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics