A dummy variable is a qualitative variable that can take only two values: 0 and 1. It is called a ``dummy" variable because it represents information from a categorical variable. A dummy variable is also referred to as an indicator variable.
It can be interpreted as a ``switch" variable were it's on (d=1) or off (d=0), indicating whether the condition holds or not.
Some examples where a dummy variable is useful include:
Dummy variables are used in regression models to analyze and estimate differences among groups.
A dummy variable is an explanatory variable that is included in a regression like other regressors in the multiple regression framework.
We can define n-1 dummy variables if the number of groups is n. Otherwise if n dummy variables are defined and included in a regression, perfect multicollinearity would not allow to estimate the regression.
One of the two groups in a definition of a dummy variable is called the ``excluded" and the other is called the ``included" group. The latter makes reference to the group identified with a value of 1 in the definition of the dummy variable. The other (``excluded" group) carries a value of 0. The ``excluded" group is also referred to as the ``control" group, or the ``benchmark" group. This is the group used as reference to make comparisons, and it represents that category for which a dummy variable is not included in the regression. For instance, if d=1 for females and d=0 for males, and we include d, then the left out group is males, which becomes the reference group. The results obtained must be compared with this reference group.
Suppose that the effect of x on y is the same for both groups, and that regardless of the level of x there is a systematic difference between the two groups. Graphically, the situation is depicted by two parallel lines, with different intercepts.
(See images.)
An interaction between a dummy variable
and a quantitative variable allows the analyst to estimate difference
in the slope among groups. For instance, in the salary equation
if we include the ``product" variable sex*experience the
coefficient of that variable would indicate whether there is any
difference between the additional salary that males and females
can obtain with an
additional year of experience.
(See images.)