© 2015

Regression Modeling Strategies

With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis


Part of the Springer Series in Statistics book series (SSS)

Table of contents

  1. Front Matter
    Pages i-xxv
  2. Frank E. Harrell Jr.
    Pages 1-11
  3. Frank E. Harrell Jr.
    Pages 13-44
  4. Frank E. Harrell Jr.
    Pages 45-61
  5. Frank E. Harrell Jr.
    Pages 63-102
  6. Frank E. Harrell Jr.
    Pages 127-142
  7. Frank E. Harrell Jr.
    Pages 161-180
  8. Frank E. Harrell Jr.
    Pages 181-217
  9. Frank E. Harrell Jr.
    Pages 219-274
  10. Frank E. Harrell Jr.
    Pages 311-325
  11. Frank E. Harrell Jr.
    Pages 389-398
  12. Frank E. Harrell Jr.
    Pages 399-422
  13. Frank E. Harrell Jr.
    Pages 423-451

About this book


This highly anticipated second edition features new chapters and sections, 225 new references, and comprehensive R software. In keeping with the previous edition, this book is about the art and science of data analysis and predictive modeling, which entails choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for fitting nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap.  The reader will gain a keen understanding of predictive accuracy, and the harm of categorizing continuous predictors or outcomes.  This text realistically deals with model uncertainty, and its effects on inference, to achieve "safe data mining." It also presents many graphical methods for communicating complex regression models to non-statisticians.

Regression Modeling Strategies presents full-scale case studies of non-trivial datasets instead of over-simplified illustrations of each method. These case studies use freely available R functions that make the multiple imputation, model building, validation, and interpretation tasks described in the book relatively easy to do. Most of the methods in this text apply to all regression models, but special emphasis is given to multiple regression using generalized least squares for longitudinal data, the binary logistic model, models for ordinal responses, parametric survival regression models, and the Cox semiparametric survival model.  A new emphasis is given to the robust analysis of continuous dependent variables using ordinal regression.

As in

the first edition, this text is intended for Masters' or Ph.D. level graduate students who have had a general introductory probability and statistics course and who are well versed in ordinary multiple regression and intermediate algebra. The book will also serve as a reference for data analysts and statistical methodologists, as it contains an up-to-date survey and bibliography of modern statistical modeling techniques. Examples used in the text mostly come from biomedical research, but the methods are applicable anywhere predictive models ("analytics") are useful, including economics, epidemiology, sociology, psychology, engineering, and marketing.


Generalized least squares Linear models Logistic regression Predictive modeling R statistical software Regression analysis Survival analysis knitr reproducible documents

Authors and affiliations

  1. 1.Department of BiostatisticsSchool of Medicine, Vanderbilt UniversityNashvilleUSA

About the authors

Frank E. Harrell, Jr. is Professor of Biostatistics and Chair, Department of Biostatistics, Vanderbilt University School of Medicine, Nashville. He has developed numerous methods for predictive modeling, quantifying predictive accuracy and model validation and has published numerous predictive models and articles on applied statistics, medical research and clinical trials. He is on the editorial board for several biomedical and methodologic journals. He is a Fellow of the American Statistical Association (ASA) and a consultant to the U.S. Food and Drug Administration and to the pharmaceutical industry. He teaches a graduate course in regression modeling strategies and a course in biostatistics for medical researchers. In 2014 he was chosen to receive the WJ Dixon Award for Excellence in Statistical Consulting by the ASA. 

Bibliographic information

Industry Sectors
Health & Hospitals
IT & Software
Consumer Packaged Goods
Oil, Gas & Geosciences
Materials & Steel
Finance, Business & Banking
Energy, Utilities & Environment


“The aim and scope of this edition to provide graduate students and professional and early career researchers with insights, understandings and working knowledge of regression modelling. … . The book is sequentially organized and well structured and many chapters are self-contained. It includes many useful topics and techniques for graduate .students and researchers alike. This book can be used as a textbook and equally as a reference book.” (Technometrics, Vol. 58 (2), February, 2016)