Abstract
Dummy Variables can be incorporated in regression models just as easily as quantitative variables. As a matter of fact, a regression model may contain regressors that are all exclusively dummy, or qualitative in nature. The results of such a model will be exactly same as the results found by Analysis of Variance (ANOVA) model. The regression model used to assess the statistical significance of the relationship between a quantitative regressand and (all) qualitative or dummy regressors is equivalent to a corresponding ANOVA model. For each qualitative regressor the number of dummy variables introduced must be one less than the no. of categories of that variable. If a qualitative variable has m categories, introduce only (m-1) dummy variables. The category for which no dummy variable is assigned is known as the base, benchmark, control, comparison, reference, or omitted category. And all comparisons are made in relation to the benchmark category. The intercept value represents the mean value of the benchmark category. The coefficients attached to the dummy variables are known as the differential intercept coefficients because they tell by how much the value of the intercept that receives the value of 1 differs from the intercept coefficient of the benchmark category.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Note: Three binary variables can be taken if the constant term is dropped from the regression equation
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Linear Probability Model Versus Linear Discriminant Function
Suppose we have multivariate observations which come from one of two groups—group 1 and group 2, say. Linear discriminant function (LDF) is a linear function of the variables by which we can predict whether a new observation has come from group 1 or group 2. Linear probability model is interpreted as the probability that the event will occur. We assume that if the event occurs then it comes from group 1, otherwise from group 2.
Linear probability model (LPM) has a direct link with linear discriminant function (LDF).
Let us first see how we construct with linear discriminant function.
Let the linear function be
To get the best discrimination between the two groups, we would want to choose the λi values so that the ratio
is maximum. Fisher suggested that we define a dummy variable
-
y = (n2)/(n1 + n2) if the individual belongs to the first group,
-
and = (−n1)/(n1 + n2) if the individual belongs to the second group,
then \({\hat{\lambda }}_{i} = \hat{\beta }_{i} /\left\{ {{\text{RSS}}/\left( {{n}_{1} + {n}_{2} - 2} \right)} \right\}\), where RSS is the Residual Sum of Squares of the regression of y on x values and \(\hat{\beta }_{i}\) s are the coefficients.
The LPM is
-
y = 1 if the individual belongs to first group,
-
and = 0 if the individual belongs to second group.
This nearly amounts to adding (n1)/(n1 + n2) to each observation of y as defined by Fisher. Thus, only the estimate of the constant term changes.
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Pal, M., Bharati, P. (2019). The Regression Models with Dummy Explanatory Variables. In: Applications of Regression Techniques. Springer, Singapore. https://doi.org/10.1007/978-981-13-9314-3_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-9314-3_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9313-6
Online ISBN: 978-981-13-9314-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)