## Abstract

Discrete choice models and in particular logit type models play an important role in understanding and quantifying individual or household behavior in relation to transport demand. An example is the choice of travel mode for a given trip under the budget and time restrictions that the individuals of a households face. In this case an important policy parameter is the effect of income (reflecting the household budget) on the choice of travel mode. This paper deals with the consequences of measurement error in income (an explanatory variable) in discrete choice models. Since it is likely to give misleading estimates of the income effect it is of interest to investigate the magnitude of the estimation bias and if possible use estimation techniques that take the measurement error problem into account.

We use data from the Danish National Travel Survey (NTS) and merge it with administrative register data that contains very detailed information about incomes. This gives a unique opportunity to learn about the magnitude and nature of the measurement error in income reported by the respondents in the Danish NTS compared to income from the administrative register (correct measure). We find that the classical measurement error model (for the logarithm to income) is valid except in the tails of the income distribution where those with low (high) income tends to over (under) report. In addition we find that the marginal distribution of the measurement errors is symmetric and leptokurtic and hence has a higher peak around zero and thicker tails than a normal distribution.

In a linear regression model where the explanatory variable is measured with error it is well-known that this gives a downward bias in the absolute value of the corresponding regression parameter (attenuation), Friedman (1957). In non-linear models it is more difficult to obtain an expression for the bias as it depends on the distribution of the true underlying variable as well as the error distribution. Chesher (1991) give some approximations to very general non-linear models and Stefanski & Carroll (1985) in the logistic regression model. Using these results we find that the bias in logit models can be substantial with the magnitude of measurement error found in income from the NTS survey.

A way to solve the problem with the measurement error is to use instruments. These are additional variables (not already in the model) that are uncorrelated with the measurement error part of income and correlated with the underlying true income. If the data contains information about the expenditure on consumption items such as housing or other consumption goods (that do not affect the discrete choice of interest directly) and we find it plausible that the possible measurement errors in these expenditures are uncorrelated with the measurement in income these will be valid instruments. However, in the Danish NTS there is only limited information about such expenditures (we know if the household owns or rents their home) so finding a good instrument will be difficult. Another possibility is to use technical instruments (understood as a function of the variables already in the model) by using the properties of the measurement errors and the model for measured income. Lewbel (1997) shows that if the distribution of the measurement errors is symmetric and the distribution of the underlying true income is skewed then there are valid technical instruments. We investigate how this IV estimation approach works in theory and illustrate it by simulation studies using the findings about the measurement error model for income from the NTS.

We use data from the Danish National Travel Survey (NTS) and merge it with administrative register data that contains very detailed information about incomes. This gives a unique opportunity to learn about the magnitude and nature of the measurement error in income reported by the respondents in the Danish NTS compared to income from the administrative register (correct measure). We find that the classical measurement error model (for the logarithm to income) is valid except in the tails of the income distribution where those with low (high) income tends to over (under) report. In addition we find that the marginal distribution of the measurement errors is symmetric and leptokurtic and hence has a higher peak around zero and thicker tails than a normal distribution.

In a linear regression model where the explanatory variable is measured with error it is well-known that this gives a downward bias in the absolute value of the corresponding regression parameter (attenuation), Friedman (1957). In non-linear models it is more difficult to obtain an expression for the bias as it depends on the distribution of the true underlying variable as well as the error distribution. Chesher (1991) give some approximations to very general non-linear models and Stefanski & Carroll (1985) in the logistic regression model. Using these results we find that the bias in logit models can be substantial with the magnitude of measurement error found in income from the NTS survey.

A way to solve the problem with the measurement error is to use instruments. These are additional variables (not already in the model) that are uncorrelated with the measurement error part of income and correlated with the underlying true income. If the data contains information about the expenditure on consumption items such as housing or other consumption goods (that do not affect the discrete choice of interest directly) and we find it plausible that the possible measurement errors in these expenditures are uncorrelated with the measurement in income these will be valid instruments. However, in the Danish NTS there is only limited information about such expenditures (we know if the household owns or rents their home) so finding a good instrument will be difficult. Another possibility is to use technical instruments (understood as a function of the variables already in the model) by using the properties of the measurement errors and the model for measured income. Lewbel (1997) shows that if the distribution of the measurement errors is symmetric and the distribution of the underlying true income is skewed then there are valid technical instruments. We investigate how this IV estimation approach works in theory and illustrate it by simulation studies using the findings about the measurement error model for income from the NTS.

Original language | English |
---|---|

Publication date | 2012 |

Publication status | Published - 2012 |

Event | Kuhmo Nectar Conference and Summer School on Transportation Economics 2012 - Berlin, Germany Duration: 18 Jun 2012 → 22 Jun 2012 |

### Conference

Conference | Kuhmo Nectar Conference and Summer School on Transportation Economics 2012 |
---|---|

Country | Germany |

City | Berlin |

Period | 18/06/2012 → 22/06/2012 |

### Bibliographical note

References:Chesher, A. 1991. The effect of measurement error. Biometrika, 78, 451–462.

Friedman, M. 1957. A Theory of the Consumption Function. Princeton University Press.

Lewbel, A. 1997. Constructing instruments for regressions with measurement error when no additional data is available, with an application to R&D. Econometrica, 65, 1201-1213.

Stefanski, L.A. and R.J. Carroll, 1990, Covariate measurement error in logistic regression, Ann. Statist., 13, 1355-51.

## Keywords

- National travel survey
- Nonclassical measurement error
- Nonlinear models