Pythagoras at the Bat

Miller, Steven J.; Corcoran, Taylor; Gossels, Jennifer; Luo, Victor; Porfilio, Jaclyn

doi:10.1007/978-3-319-08440-4_6

Steven J. Miller³,
Taylor Corcoran⁴,
Jennifer Gossels⁵,
Victor Luo³ &
…
Jaclyn Porfilio³

1252 Accesses
2 Citations

Abstract

The Pythagorean formula is one of the most popular ways to measure the true ability of a team. It is very easy to use, estimating a team’s winning percentage from the runs they score and allow. This data is readily available on standings pages; no computationally intensive simulations are needed. Normally accurate to within a few games per season, it allows teams to determine how much a run is worth in different situations. This determination helps solve some of the most important economic decisions a team faces: How much is a player worth, which players should be pursued, and how much should they be offered. We discuss the formula and these applications in detail, and provide a theoretical justification, both for the formula as well as simpler linear estimators of a team’s winning percentage. The calculations and modeling are discussed in detail, and when possible multiple proofs are given. We analyze the 2012 season in detail, and see that the data for that and other recent years support our modeling conjectures. We conclude with a discussion of work in progress to generalize the formula and increase its predictive power without needing expensive simulations, though at the cost of requiring play-by-play data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The data below is from http://www.baseball-reference.com/players/gl.cgi?id=remlimi01&t=p&year=2005 and http://scores.espn.go.com/mlb/boxscore?gameId=250816106.

References

Birnbaum, P. (2010, April 24). Sabermetric Research: Saturday. Retrieved from http://blog.philbirnbaum.com/2010/04/marginal-value-of-win-in-baseball.html.
Bishop, Y. M. M., & Fienberg, S. E. (1969). Incomplete two-dimensional contingency tables. Biometrics, 25, 119–128.
Article Google Scholar
Dayaratna, K., & Miller, S. J. (2012). First order approximations of the Pythagorean won-loss formula for predicting MLB teams winning percentages. By The Numbers—The Newsletter of the SABR Statistical Analysis Committee, 22, 15–19.
Google Scholar
Hammond, C. N. B., Johnson, W. P., & Miller, S. J. (2013). The James Function. Retrieved from http://arxiv.org/pdf/1312.7627v2.pdf (preprint).
Hundel, H. (2003). Derivation of James’ Pythagorean Formula. Retrieved from https://groups.google.com/forum/#!topic/rec.puzzles/O-DmrUljHds.
James, B. (1981). 1981 Baseball Abstract. Lawrence, KS: Self-published.
Google Scholar
Jones, M., & Tappin, L. (2005). The Pythagorean theorem of baseball and alternative models. The UMAP Journal, 26, 2.
Google Scholar
Luo, V., & Miller, S. J. (2014). Relieving and Readjusting Pythagoras. Retrieved from http://arxiv.org/pdf/1406.3402v2 (preprint).
Miller, S. J. (2007). A derivation of the Pythagorean won-loss formula in baseball. Chance Magazine, 20, 40–48. (An abridged version appeared in The Newsletter of the SABR Statistical Analysis Committee, 16, 17–22, (February 2006)). http://arxiv.org/pdf/math/0509698.
Silver, N. (2006). Is Alex Rodriguez overpaid. In Baseball between the numbers: Why everything you know about the game is wrong. New York: The Baseball Prospectus Team of Experts. Basic Books.
Google Scholar
Wikipedia, Pythagorean Expectation. Retrieved from http://en.wikipedia.org/wiki/Pythagoreanexpectation.
Wikipedia, Weibull. Retrieved from http://en.wikipedia.org/wiki/Weibulldistribution.

Download references

Acknowledgments

The first author was partially supported by NSF Grants DMS0970067 and DMS1265673. He thanks Chris Chiang for suggesting the title of this paper, numerous students of his at Brown University and Williams College, as well as Cameron and Kayla Miller, for many lively conversations on mathematics and sports, Michael Stone for comments on an earlier draft, and Phil Birnbaum, Kevin Dayaratna, Warren Johnson and Chris Long for many sabermetrics discussions. This paper is dedicated to his great uncle Newt Bromberg, who assured him he would live long enough to see the Red Sox win it all, and the 2004, 2007 and 2013 Red Sox who made it happen (after watching the 2013 victory his six year old son Cameron turned to him and commented that he got to see it at a much younger age!).

Author information

Authors and Affiliations

Williams College, Williamstown, MA, 01267, USA
Steven J. Miller, Victor Luo & Jaclyn Porfilio
The University of Arizona, Tucson, AZ, 85721, USA
Taylor Corcoran
Princeton University, Princeton, NJ, 08544, USA
Jennifer Gossels

Authors

Steven J. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Taylor Corcoran
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Gossels
View author publications
You can also search for this author in PubMed Google Scholar
Victor Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jaclyn Porfilio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steven J. Miller .

Editor information

Editors and Affiliations

Dept. of Industrial & Systems Engin., University of Florida, Gainesville, Florida, USA
Panos M. Pardalos
Laboratory of Algorithms & Technologies, Higher School of Economics, N.Novgorod, Russia
Victor Zamaraev

Appendix

1.1 Calculating the Mean of a Weibull

Letting $\mu _{\alpha ,\beta ,\gamma }$ denote the mean of $f(x;\alpha ,\beta ,\gamma )$, we have

$$\begin{aligned} \mu _{\alpha ,\beta ,\gamma }&\ = \&\int \limits _\beta ^\infty x \cdot \frac{\gamma }{\alpha } \left( \frac{x-\beta }{\alpha }\right) ^{\gamma -1} e^{-((x-\beta )/\alpha )^\gamma }\mathrm{d} x\nonumber \\&= \int \limits _\beta ^\infty \alpha \frac{x-\beta }{\alpha } \cdot \frac{\gamma }{\alpha } \left( \frac{x-\beta }{\alpha }\right) ^{\gamma -1}e^{-((x-\beta )/\alpha )^\gamma }\mathrm{d} x\ +\ \beta . \end{aligned}$$

(22)

We change variables by setting $u = \left( \frac{x-\beta }{\alpha }\right) ^\gamma $. Then $\mathrm{d} u = \frac{\gamma }{\alpha } \left( \frac{x-\beta }{\alpha }\right) ^{\gamma -1}\mathrm{d} x$ and we have

$$\begin{aligned} \mu _{\alpha ,\beta ,\gamma }&\ = \&\int \limits _0^\infty \alpha u^{\gamma ^{-1}} \cdot e^{-u} \mathrm{d} u \ + \ \beta \nonumber \\&= \alpha \int \limits _0^\infty e^{-u} u^{1+\gamma ^{-1}} \frac{\mathrm{d} u}{u} \ + \ \beta \nonumber \\&= \alpha \varGamma (1+\gamma ^{-1}) \ + \ \beta .\end{aligned}$$

(23)

1.2 Independence Test with Structural Zeros

We describe the iterative procedure needed to handle the structural zeros. A good reference is Bishop and Fienberg [2].

Let Bin($k$) be the $k$th bin used in the chi-squared test for independence. For each team’s incomplete contingency table, let $O_{r,c}$ be the observed number of games where the number of runs scored is in Bin($r$) and runs allowed is in Bin($c$). As games cannot end in a tie, we have $O_{r,r} = 0$ for all $r$.

We construct the expected contingency table with entries $E_{r,c}$ using an iterative process to find the maximum likelihood estimators for each entry. For $1 \le r,c \le 12$, let

$$\begin{aligned} E^{(0)}_{r,c} \ =\ \left\{ \begin{array}{lr} 1 &{} \ \mathrm{if}\ r \ne c\ \\ 0 &{} \ \mathrm{if}\ r = c, \end{array} \right. \end{aligned}$$

(24)

and let

$$\begin{aligned} X_{r,+}\ =\ \sum _{c} O_{r,c}, \ \ \ X_{c,+}\ =\ \sum _{r} O_{r,c}. \end{aligned}$$

(25)

We then have that

$$\begin{aligned} E^{(\ell )}_{r,c} \ =\ \left\{ \begin{array}{lr} E^{(\ell -1)}_{r,c}X_{r,+} / \sum _{c} E^{(\ell -1)}_{r,c}\ \ \ \mathrm{if }\ \ell \ \mathrm{is\ odd}\\ E^{(\ell -1)}_{r,c}X_{c,+} / \sum _{r} E^{(\ell -1)}_{r,c}\ \ \ \mathrm{if }\ \ell \ \mathrm{is\ even.} \end{array} \right. \end{aligned}$$

(26)

The values of $E_{r,c}$ can be found by taking the limit as $\ell \rightarrow \infty $ of $E^{(\ell )}_{r,c}$, and typically the convergence is rapid. The statistic

$$\begin{aligned} \sum _{r, c \atop r \ne c} \frac{(E_{r,c} - O_{r,c})^2}{E_{r,c}} \end{aligned}$$

(27)

follows a chi-square distribution with $(11-1)^2 - 11 = 89$ degrees of freedom.

1.3 Linearizing Pythagoras

Unlike the argument in Sect. 7, we do not assume knowledge of multivariable calculus and derive the linearization using just single variable methods. The calculations below are of interest in their own right, as they highlight good approximation techniques.

We assume there is some exponent $\gamma $ such that the winning percentage, $\mathrm WP$, is

$$\begin{aligned} \mathrm WP\ = \ \frac{\mathrm{RS}^\gamma }{\mathrm{RS}^\gamma +\mathrm{RA}^\gamma }, \end{aligned}$$

(28)

with $\mathrm{RS}$ and $\mathrm{RA}$ the total runs scored and allowed. We multiply the right hand side by $(1/\mathrm{RS}^\gamma )/(1/\mathrm{RS}^\gamma )$ and write $\mathrm{RA}^\gamma $ as $\mathrm{RS}^\gamma - (\mathrm{RS}^\gamma - \mathrm{RA}^\gamma )$, and find

$$\begin{aligned} \mathrm WP&\ = \&\frac{1}{1 + \frac{\mathrm{RA}^\gamma }{\mathrm{RS}^\gamma }} \ = \ \left( 1 + \frac{\mathrm{RA}^\gamma }{\mathrm{RS}^\gamma }\right) ^{-1} \ = \ \left( 1 + \frac{\mathrm{RS}^\gamma - (\mathrm{RS}^\gamma -\mathrm{RA}^\gamma )}{\mathrm{RS}^\gamma }\right) ^{-1} \nonumber \\&= \left( 1 + 1 - \frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{\mathrm{RS}^\gamma }\right) ^{-1} \nonumber \\&= \left( 2 \cdot \left( 1 - \frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{2\mathrm{RS}^\gamma }\right) \right) ^{-1} \nonumber \\&= \frac{1}{2} \left( 1 - \frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{2\mathrm{RS}^\gamma }\right) ^{-1}; \end{aligned}$$

(29)

notice we manipulated the algebra to pull out a 1/2, which indicates an average team; thus the remaining factor is the fluctuations about average.

We now use the geometric series formula, which says that if $|r| < 1$ then

$$\begin{aligned} \frac{1}{1-r} \ = \ 1 + r + r^2 + r^3 + \cdots .\end{aligned}$$

(30)

We let $r = (\mathrm{RS}^\gamma -\mathrm{RA}^\gamma )/2\mathrm{RS}^\gamma $; since runs scored and runs allowed should be close to each other, the difference of their $\gamma $ powers divided by twice the number of runs scored should be small. Thus $r$ in our geometric expansion should be close to zero, and we find

$$\begin{aligned} \mathrm WP&\ = \&\frac{1}{2}\left( 1 + \frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{2\mathrm{RS}^\gamma } + \left( \frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{2\mathrm{RS}^\gamma }\right) ^2 + \left( \frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{2\mathrm{RS}^\gamma }\right) ^3 + \cdots \right) \nonumber \\&\approx 0.500 + \frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{4\mathrm{RS}^\gamma }. \end{aligned}$$

(31)

We now make some approximations. We expect $\mathrm{RS}^\gamma -\mathrm{RA}^\gamma $ to be small, and thus $\frac{\mathrm{RS}^\gamma -\mathrm{RA}^\gamma }{2\mathrm{RS}}$ should be small. This means we only need to keep the constant and linear terms in the expansion. Note that if we only kept the constant term, there would be no dependence on points scored or allowed!

We need to do a little more analysis to obtain a formula that is linear in $\mathrm{RS}- \mathrm{RA}$. Let $\mathrm{R}_\mathrm{total}$ denote the average number of runs scored per team in the league. We can write $\mathrm{RS}= \mathrm{R}_\mathrm{ave}+ x_s$ and $\mathrm{RA}= \mathrm{R}_\mathrm{ave}+ x_a$, where it is reasonable to assume $x_s$ and $x_a$ are small relative to $\mathrm{R}_\mathrm{total}$. The Mean Value Theorem from Calculus says that if $f(x) = (\mathrm{R}_\mathrm{total}+x)^\gamma $, then

$$\begin{aligned} f(x_s) - f(x_a) \ = \ f'(x_c) (x_s - x_a), \end{aligned}$$

(32)

where $x_c$ is some intermediate point between $x_s$ and $x_a$. As $f'(x) = \gamma (\mathrm{R}_\mathrm{total}+x)^{\gamma -1}$, we find

$$\begin{aligned} \mathrm{RS}^\gamma - \mathrm{RA}^\gamma&\ = \&f(x_s) - f(x_a) \ = \ f'(x_c) (x_s - x_a) \ = \ \gamma (\mathrm{R}_\mathrm{total}+x_c)^{\gamma -1}(\mathrm{RS}- \mathrm{RA}), \nonumber \\ \end{aligned}$$

(33)

as $x_s - x_a = \mathrm{RS}- \mathrm{RA}$. Substituting this into (31) gives

$$\begin{aligned} \mathrm WP&\ \approx \&0.500 + \frac{\gamma (\mathrm{R}_\mathrm{total}+x_c)^{\gamma -1} (\mathrm{RS}- \mathrm{RA})}{4\mathrm{RS}^\gamma } \ = \ 0.500 + \frac{\gamma (\mathrm{R}_\mathrm{total}+x_c)^{\gamma -1}}{4\mathrm{RS}^\gamma } (\mathrm{RS}-\mathrm{RA}).\nonumber \\ \end{aligned}$$

(34)

We make one final approximation. We replace the factors of $\mathrm{R}_\mathrm{total}+x_c$ in the numerator and $\mathrm{RS}^\gamma $ in the denominator with $\mathrm{R}_\mathrm{total}^\gamma $, the league average, and reach

$$\begin{aligned} \mathrm WP\ \approx \ 0.500 + \frac{\gamma }{4\mathrm{R}_\mathrm{total}} (\mathrm{RS}- \mathrm{RA}). \end{aligned}$$

(35)

Thus the simple linear approximation model reproduces the result from multivariable Taylor series, namely that the interesting coefficient $\mathrm{B}$ should be approximately $\gamma /(4\mathrm{R}_\mathrm{total})$.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Miller, S.J., Corcoran, T., Gossels, J., Luo, V., Porfilio, J. (2014). Pythagoras at the Bat. In: Pardalos, P., Zamaraev, V. (eds) Social Networks and the Economics of Sports. Springer, Cham. https://doi.org/10.1007/978-3-319-08440-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-08440-4_6
Published: 22 August 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08439-8
Online ISBN: 978-3-319-08440-4
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics

Pythagoras at the Bat

Abstract

Access this chapter

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Calculating the Mean of a Weibull

1.2 Independence Test with Structural Zeros

1.3 Linearizing Pythagoras

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation