# Introduction

• Werner Vach
Part of the Lecture Notes in Statistics book series (LNS, volume 86)

## Abstract

In many scientific areas a basic task is to assess the simultaneous influence of several factors on a quantity of interest. Regression models provide therefore a powerful framework, and estimation of the effects in such models is a well-established field of statistics. In general this estimation is based on measurements of the factors (covariates) and the quantity of interests (outcome variable) for a set of units. However, in practice often not all covariates can be measured for all units, i.e., some of the units show a missing value in one or several covariates. The reasons can be very different and depend mainly on the type of the measurement procedure for a single covariate and the type of the data collection procedure. Some examples should illustrate this:
• If the covariate values are collected by a questionnaire or interview, non-response is a typical source for missing values. It may be due to a true lack of knowledge, if for example a person is asked for certain diseases during its childhood, or to an intentional refusal. The latter is especially to be expected for embarrassing questions like alcohol consumption, drug abuse, sexual activities, or income.

• In retrospective studies covariate values are often collected on the basis of documents like hospital records. Incompleteness of the documents causes missing values.

• In clinical trials biochemical parameters are often used as covariates. The measurement of these parameters often requires a certain amount of blood, urine or tissue, which may not be available.

• In prospective clinical trials the recruitment of patients can last several years. Meanwhile scientific progress may discover new influential factors, which may cause the decision to add the measurement of the covariate to the data collection procedure. For patients recruited before this decision the value of this covariate is missing.

• If the measurement of a covariate is very expensive, one may restrict the measurement to a subset of all units.

• Even in a well planned and conducted study small accidents can happen. A test tube may break, a case report form may be lost on the mail, an examination may be forgotten, the inaccuracy of an instrument may be detected too late, etc. Each accident may cause a missing value.

## Keywords

Logistic Regression Case Report Form Prospective Clinical Trial Data Collection Procedure Asymptotic Variance
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.