Skip to main content

Which Fit Is Fitter

  • Chapter
  • First Online:
Book cover The Art of Regression Modeling in Road Safety
  • 1553 Accesses

Abstract

In this chapter discussion is about how well a model fits the data. There are many single-number goodness-of-fit measures but they all describe only the overall fit. For SPF modeling, this is insufficient. For the SPF to produce useful estimates, they must be good for all values of every variable. An alternative tool to describe goodness of fit is suggested and its uses explained. The CURE plot shows at a glance how good a fit is and what the remaining concerns are. Long up or down runs indicate regions of bias which demand model improvement either by the addition of new traits or by a change of functional form; large vertical drops in the CURE plot invite the examination of outliers. The CURE plot is useful in determining whether a fit is acceptable and in judging which of two fits is better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    While this sentence seems unobjectionable, a comment is in order. If the aim is indeed that of getting a good fit then why, when estimating parameters, the objective is not that of maximizing fit? The answer is that the usual focus in regression modeling is that of getting good parameter estimates, not good estimates of E{μ}. This dichotomy and choice of perspective was discussed in Sect. 1.5 and its implication for the choice of objective function will be further explored in Chap. 8. Here it is sufficient to note that in traditional regression modeling there is a tension between estimating parameters with one purpose in mind and judging the quality of the model with a different yardstick in hand.

  2. 2.

    There is in use an annoying multiplicity of terms which all seem to have the same meaning. In curve-fitting one tends to use “difference ” and “deviation”; both stand for the D in acronyms such as SD and SSD. In examining the goodness of a fit the word “residual” is usually used to mean the same. I will shun the word “deviation” and, reluctantly, use “difference” for the D in acronyms and “residual” to mean the same when speaking about the quality of a fit. There is yet another related and easily confused term in common use: “error.” While “difference” and “residual” always refer to “observed-fitted,” “error” refers to “observed-expected.”

  3. 3.

    Other commonly used goodness-of-fit measures are the Root Mean Square Error of Approximations, Statistical Deviance, the AIC (Akaike Information Criterion), the BIC (Bayesian Information Criterion), etc. See, e.g., Miaou (1996, Miaou et al. 1996), Schermelleh-Engel et al. (2003), and Hooper and Mullen (2008).

  4. 4.

    The question of what SPFs are for was discussed in Chap. 1. More specifically the differences between the “applications” and the “research” perspectives were described in Sect. 1.5. For this book the “applications” perspective was chosen and, as a consequence, emphasis is on how well E{μ} and σ{μ} are estimated, not on the accuracy of the parameter estimates. This shift in emphasis is reflected in modeling. Here it leads to the abandonment of single-number measures of goodness of fit.

  5. 5.

    See, e.g., Draper and Smith (1981, p. 148).

  6. 6.

    The striation is due to the definition residual ≡ observed-fitted in which the observed is always an integer.

  7. 7.

    To show the relevant detail clearly, segment length was truncated at 3 miles leaving off the longer 5 % of segments.

  8. 8.

    To download this spreadsheet, go to http://extras.springer.com/ and enter the ISBN of this book. Look in the “Spreadsheets” folder for “Chap. 7. CURE computations .xls or xlsx”

  9. 9.

    In a one-dimensional random walk one follows the evolution of the sum of independent random variables. Let R 1, R 2, …, R N be a sequence of N independent random variables. The sequence of points with coordinates \( \left(i,\ {\displaystyle \underset{1}{\overset{i}{\varSigma }}}\ {R}_i\right) \) can be visualized as a random walk. It is what the movement over time of a stock price would look if the probability of a price increase was the same as the probability of an equal price drop, and if stock price changes over time were statistically independent. The random walk of interest here is one where the Rs correspond to residuals. When the residual is positive, the sum moves up, with a negative residual it moves down. Inasmuch as for every i the sum of all residuals is expected to be 0, this kind of random walk oscillates around the horizontal axis.

  10. 10.

    Another kind of bias, the “bias-in-use,” will be introduced and discussed in Chap. 10. Bias-in-use occurs when the information about variables which is available to the user of the SPF differs from the variables in the SPF.

  11. 11.

    The addition of this constraint will cause a slight increase of the objective function when minimized and a small decrease when maximized. Thus, for example, in Fig. 7.7 the SSD increased from 78,648.7 to 78,650.3.

  12. 12.

    Here only the procedure for computing the limits will be described. The derivation of the expression by which the limits are computed is in Appendix I. To download this spreadsheet, go to http://extras.springer.com/ and enter the ISBN of this book. Look in the “Spreadsheets” folder for “Chap. 7. CURE computations .xls or xlsx”.

  13. 13.

    In probability theory, the central limit theorem states that, given certain conditions, the sum of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.

  14. 14.

    Because at its right end the CURE plot must land on the horizontal axis the underestimation of the fitted values in the AB (or A′B′) range is always accompanied by a compensatory overestimation elsewhere.

  15. 15.

    One such reason is that the distribution of crash counts is skewed while least-squares curve-fitting is suitable for symmetrical distributions.

References

  • Draper N, Smith H (1981) Applied regression analysis, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Hauer E, Bamfo J (1997) Two tools for finding what function links the dependent variable to the explanatory variables. Proceedings of the ICTCT 97 conference, Lund, Sweden, pp 1–19

    Google Scholar 

  • Hooper DC, Mullen MR (2008) Structural equation modelling: guidelines for determining model fit. Electron J Bus Res Meth 6(1):53–60, Available online at www.ejbrm.com

    Google Scholar 

  • Miaou S-P (1996) Measuring the goodness-of-fit of accident prediction models. FHWA-RD-96-040. Federal Highway Administration, Office of Safety and Traffic Operations, Washington DC

    Google Scholar 

  • Miaou S-P, Lu A, Lum HS (1996) Pitfalls of using R2 to evaluate goodness of fit of accident prediction models. Transp Res Rec 1542:6–13

    Article  Google Scholar 

  • Schermelleh-Engel K, Moosbrugger H, Müller H (2003) Evaluating the fit of structural equation models: tests of significance and descriptive goodness-of-fit measures. Meth Psychol Res 8(2):23–74

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Hauer, E. (2015). Which Fit Is Fitter. In: The Art of Regression Modeling in Road Safety. Springer, Cham. https://doi.org/10.1007/978-3-319-12529-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12529-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12528-2

  • Online ISBN: 978-3-319-12529-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics