Abstract
The Pythagorean formula is one of the most popular ways to measure the true ability of a team. It is very easy to use, estimating a team’s winning percentage from the runs they score and allow. This data is readily available on standings pages; no computationally intensive simulations are needed. Normally accurate to within a few games per season, it allows teams to determine how much a run is worth in different situations. This determination helps solve some of the most important economic decisions a team faces: How much is a player worth, which players should be pursued, and how much should they be offered. We discuss the formula and these applications in detail, and provide a theoretical justification, both for the formula as well as simpler linear estimators of a team’s winning percentage. The calculations and modeling are discussed in detail, and when possible multiple proofs are given. We analyze the 2012 season in detail, and see that the data for that and other recent years support our modeling conjectures. We conclude with a discussion of work in progress to generalize the formula and increase its predictive power without needing expensive simulations, though at the cost of requiring play-by-play data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Birnbaum, P. (2010, April 24). Sabermetric Research: Saturday. Retrieved from http://blog.philbirnbaum.com/2010/04/marginal-value-of-win-in-baseball.html.
Bishop, Y. M. M., & Fienberg, S. E. (1969). Incomplete two-dimensional contingency tables. Biometrics, 25, 119–128.
Dayaratna, K., & Miller, S. J. (2012). First order approximations of the Pythagorean won-loss formula for predicting MLB teams winning percentages. By The Numbers—The Newsletter of the SABR Statistical Analysis Committee, 22, 15–19.
Hammond, C. N. B., Johnson, W. P., & Miller, S. J. (2013). The James Function. Retrieved from http://arxiv.org/pdf/1312.7627v2.pdf (preprint).
Hundel, H. (2003). Derivation of James’ Pythagorean Formula. Retrieved from https://groups.google.com/forum/#!topic/rec.puzzles/O-DmrUljHds.
James, B. (1981). 1981 Baseball Abstract. Lawrence, KS: Self-published.
Jones, M., & Tappin, L. (2005). The Pythagorean theorem of baseball and alternative models. The UMAP Journal, 26, 2.
Luo, V., & Miller, S. J. (2014). Relieving and Readjusting Pythagoras. Retrieved from http://arxiv.org/pdf/1406.3402v2 (preprint).
Miller, S. J. (2007). A derivation of the Pythagorean won-loss formula in baseball. Chance Magazine, 20, 40–48. (An abridged version appeared in The Newsletter of the SABR Statistical Analysis Committee, 16, 17–22, (February 2006)). http://arxiv.org/pdf/math/0509698.
Silver, N. (2006). Is Alex Rodriguez overpaid. In Baseball between the numbers: Why everything you know about the game is wrong. New York: The Baseball Prospectus Team of Experts. Basic Books.
Wikipedia, Pythagorean Expectation. Retrieved from http://en.wikipedia.org/wiki/Pythagoreanexpectation.
Wikipedia, Weibull. Retrieved from http://en.wikipedia.org/wiki/Weibulldistribution.
Acknowledgments
The first author was partially supported by NSF Grants DMS0970067 and DMS1265673. He thanks Chris Chiang for suggesting the title of this paper, numerous students of his at Brown University and Williams College, as well as Cameron and Kayla Miller, for many lively conversations on mathematics and sports, Michael Stone for comments on an earlier draft, and Phil Birnbaum, Kevin Dayaratna, Warren Johnson and Chris Long for many sabermetrics discussions. This paper is dedicated to his great uncle Newt Bromberg, who assured him he would live long enough to see the Red Sox win it all, and the 2004, 2007 and 2013 Red Sox who made it happen (after watching the 2013 victory his six year old son Cameron turned to him and commented that he got to see it at a much younger age!).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Calculating the Mean of a Weibull
Letting \(\mu _{\alpha ,\beta ,\gamma }\) denote the mean of \(f(x;\alpha ,\beta ,\gamma )\), we have
We change variables by setting \(u = \left( \frac{x-\beta }{\alpha }\right) ^\gamma \). Then \(\mathrm{d} u = \frac{\gamma }{\alpha } \left( \frac{x-\beta }{\alpha }\right) ^{\gamma -1}\mathrm{d} x\) and we have
1.2 Independence Test with Structural Zeros
We describe the iterative procedure needed to handle the structural zeros. A good reference is Bishop and Fienberg [2].
Let Bin(\(k\)) be the \(k\)th bin used in the chi-squared test for independence. For each team’s incomplete contingency table, let \(O_{r,c}\) be the observed number of games where the number of runs scored is in Bin(\(r\)) and runs allowed is in Bin(\(c\)). As games cannot end in a tie, we have \(O_{r,r} = 0\) for all \(r\).
We construct the expected contingency table with entries \(E_{r,c}\) using an iterative process to find the maximum likelihood estimators for each entry. For \(1 \le r,c \le 12\), let
and let
We then have that
The values of \(E_{r,c}\) can be found by taking the limit as \(\ell \rightarrow \infty \) of \(E^{(\ell )}_{r,c}\), and typically the convergence is rapid. The statistic
follows a chi-square distribution with \((11-1)^2 - 11 = 89\) degrees of freedom.
1.3 Linearizing Pythagoras
Unlike the argument in Sect. 7, we do not assume knowledge of multivariable calculus and derive the linearization using just single variable methods. The calculations below are of interest in their own right, as they highlight good approximation techniques.
We assume there is some exponent \(\gamma \) such that the winning percentage, \(\mathrm WP\), is
with \(\mathrm{RS}\) and \(\mathrm{RA}\) the total runs scored and allowed. We multiply the right hand side by \((1/\mathrm{RS}^\gamma )/(1/\mathrm{RS}^\gamma )\) and write \(\mathrm{RA}^\gamma \) as \(\mathrm{RS}^\gamma - (\mathrm{RS}^\gamma - \mathrm{RA}^\gamma )\), and find
notice we manipulated the algebra to pull out a 1/2, which indicates an average team; thus the remaining factor is the fluctuations about average.
We now use the geometric series formula, which says that if \(|r| < 1\) then
We let \(r = (\mathrm{RS}^\gamma -\mathrm{RA}^\gamma )/2\mathrm{RS}^\gamma \); since runs scored and runs allowed should be close to each other, the difference of their \(\gamma \) powers divided by twice the number of runs scored should be small. Thus \(r\) in our geometric expansion should be close to zero, and we find
We now make some approximations. We expect \(\mathrm{RS}^\gamma -\mathrm{RA}^\gamma \) to be small, and thus \(\frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{2\mathrm{RS}}\) should be small. This means we only need to keep the constant and linear terms in the expansion. Note that if we only kept the constant term, there would be no dependence on points scored or allowed!
We need to do a little more analysis to obtain a formula that is linear in \(\mathrm{RS}- \mathrm{RA}\). Let \(\mathrm{R}_\mathrm{total}\) denote the average number of runs scored per team in the league. We can write \(\mathrm{RS}= \mathrm{R}_\mathrm{ave}+ x_s\) and \(\mathrm{RA}= \mathrm{R}_\mathrm{ave}+ x_a\), where it is reasonable to assume \(x_s\) and \(x_a\) are small relative to \(\mathrm{R}_\mathrm{total}\). The Mean Value Theorem from Calculus says that if \(f(x) = (\mathrm{R}_\mathrm{total}+x)^\gamma \), then
where \(x_c\) is some intermediate point between \(x_s\) and \(x_a\). As \(f'(x) = \gamma (\mathrm{R}_\mathrm{total}+x)^{\gamma -1}\), we find
as \(x_s - x_a = \mathrm{RS}- \mathrm{RA}\). Substituting this into (31) gives
We make one final approximation. We replace the factors of \(\mathrm{R}_\mathrm{total}+x_c\) in the numerator and \(\mathrm{RS}^\gamma \) in the denominator with \(\mathrm{R}_\mathrm{total}^\gamma \), the league average, and reach
Thus the simple linear approximation model reproduces the result from multivariable Taylor series, namely that the interesting coefficient \(\mathrm{B}\) should be approximately \(\gamma /(4\mathrm{R}_\mathrm{total})\).
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Miller, S.J., Corcoran, T., Gossels, J., Luo, V., Porfilio, J. (2014). Pythagoras at the Bat. In: Pardalos, P., Zamaraev, V. (eds) Social Networks and the Economics of Sports. Springer, Cham. https://doi.org/10.1007/978-3-319-08440-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-08440-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08439-8
Online ISBN: 978-3-319-08440-4
eBook Packages: Business and EconomicsBusiness and Management (R0)