The data set HC
from mlogit
contains data in R
format on the choice of heating and central cooling system for 250 single-family, newly built houses in California.
The alternatives are:
gcc
,ecc
,erc
,hpc
,gc
,ec
,er
.Heat pumps necessarily provide both heating and cooling such that heat pump without cooling is not an alternative.
The variables are:
depvar
gives the name of the chosen alternative,ich.alt
are the installation cost for the heating portion of the system,icca
is the installation cost for coolingoch.alt
are the operating cost for the heating portion of the systemocca
is the operating cost for coolingincome
is the annual income of the householdNote that the full installation cost of alternative gcc
is ich.gcc+icca
, and similarly for the operating cost and for the other alternatives with cooling.
gcc},
ecc}, erc},
hpc}) in one nest and the non-cooling alternatives (gc},
ec}, `er}) in another nest.library("mlogit")
data("HC", package = "mlogit")
HC <- dfidx(HC, varying = c(2:8, 10:16), choice = "depvar")
cooling.modes <- idx(HC, 2) %in% c('gcc', 'ecc', 'erc', 'hpc')
room.modes <- idx(HC, 2) %in% c('erc', 'er')
# installation / operating costs for cooling are constants,
# only relevant for mixed systems
HC$icca[! cooling.modes] <- 0
HC$occa[! cooling.modes] <- 0
# create income variables for two sets cooling and rooms
HC$inc.cooling <- HC$inc.room <- 0
HC$inc.cooling[cooling.modes] <- HC$income[cooling.modes]
HC$inc.room[room.modes] <- HC$income[room.modes]
# create an intercet for cooling modes
HC$int.cooling <- as.numeric(cooling.modes)
# estimate the model with only one nest elasticity
nl <- mlogit(depvar ~ ich + och +icca + occa + inc.room + inc.cooling + int.cooling | 0, HC,
nests = list(cooling = c('gcc','ecc','erc','hpc'),
other = c('gc', 'ec', 'er')), un.nest.el = TRUE)
summary(nl)
##
## Call:
## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room +
## inc.cooling + int.cooling | 0, data = HC, nests = list(cooling = c("gcc",
## "ecc", "erc", "hpc"), other = c("gc", "ec", "er")), un.nest.el = TRUE)
##
## Frequencies of alternatives:choice
## ec ecc er erc gc gcc hpc
## 0.004 0.016 0.032 0.004 0.096 0.744 0.104
##
## bfgs method
## 11 iterations, 0h:0m:0s
## g'(-H)^-1g = 7.26E-06
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## ich -0.554878 0.144205 -3.8478 0.0001192 ***
## och -0.857886 0.255313 -3.3601 0.0007791 ***
## icca -0.225079 0.144423 -1.5585 0.1191212
## occa -1.089458 1.219821 -0.8931 0.3717882
## inc.room -0.378971 0.099631 -3.8038 0.0001425 ***
## inc.cooling 0.249575 0.059213 4.2149 2.499e-05 ***
## int.cooling -6.000415 5.562423 -1.0787 0.2807030
## iv 0.585922 0.179708 3.2604 0.0011125 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -178.12
The correlation is approximately \(1-0.59=0.41\). It's a moderate correlation.
We can use a t-test of the hypothesis that the log-sum coefficient equal to 1. The t-statistic is :
(coef(nl)['iv'] - 1) / sqrt(vcov(nl)['iv', 'iv'])
## iv
## -2.304171
The critical value of t for 95% confidence is 1.96. So we can reject the hypothesis at 95% confidence.
We can also use a likelihood ratio test because the multinomial logit is a special case of the nested model.
# First estimate the multinomial logit model
ml <- update(nl, nests = NULL)
lrtest(nl, ml)
## Likelihood ratio test
##
## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 8 -178.12
## 2 7 -180.29 -1 4.3234 0.03759 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that the hypothesis is rejected at 95% confidence, but not at 99% confidence.
nl2 <- update(nl, nests = list(central = c('ec', 'ecc', 'gc', 'gcc', 'hpc'),
room = c('er', 'erc')))
summary(nl2)
##
## Call:
## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room +
## inc.cooling + int.cooling | 0, data = HC, nests = list(central = c("ec",
## "ecc", "gc", "gcc", "hpc"), room = c("er", "erc")), un.nest.el = TRUE)
##
## Frequencies of alternatives:choice
## ec ecc er erc gc gcc hpc
## 0.004 0.016 0.032 0.004 0.096 0.744 0.104
##
## bfgs method
## 10 iterations, 0h:0m:0s
## g'(-H)^-1g = 5.87E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## ich -1.13818 0.54216 -2.0993 0.03579 *
## och -1.82532 0.93228 -1.9579 0.05024 .
## icca -0.33746 0.26934 -1.2529 0.21024
## occa -2.06328 1.89726 -1.0875 0.27681
## inc.room -0.75722 0.34292 -2.2081 0.02723 *
## inc.cooling 0.41689 0.20742 2.0099 0.04444 *
## int.cooling -13.82487 7.94031 -1.7411 0.08167 .
## iv 1.36201 0.65393 2.0828 0.03727 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -180.02
The log-sum coefficient is over 1. This implies that there is more substitution across nests than within nests. I don't think this is very reasonable, but people can differ on their concepts of what's reasonable.
\begin{answer}[5] The t-statistic is :
(coef(nl2)['iv'] - 1) / sqrt(vcov(nl2)['iv', 'iv'])
## iv
## 0.5535849
lrtest(nl2, ml)
## Likelihood ratio test
##
## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 8 -180.02
## 2 7 -180.29 -1 0.5268 0.468
We cannot reject the hypothesis at standard confidence levels.
logLik(nl)
## 'log Lik.' -178.1247 (df=8)
logLik(nl2)
## 'log Lik.' -180.0231 (df=8)
The \(\ln L\) is worse (more negative.) All in all, this seems like a less appropriate nesting structure.
nl3 <- update(nl, un.nest.el = FALSE)
The correlation in the cooling nest is around 1-0.60 = 0.4 and that for the non-cooling nest is around 1-0.45 = 0.55. So the correlation is higher in the non-cooling nest. Perhaps more variation in comfort when there is no cooling. This variation in comfort is the same for all the non-cooling alternatives.
We can use a likelihood ratio tests with models
nl
andnl3
.
lrtest(nl, nl3)
## Likelihood ratio test
##
## Model 1: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## Model 2: depvar ~ ich + och + icca + occa + inc.room + inc.cooling + int.cooling |
## 0
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 8 -178.12
## 2 9 -178.04 1 0.1758 0.675
The restricted model is the one from exercise 1 that has one log-sum coefficient. The unrestricted model is the one we just estimated. The test statistics is 0.6299. The critical value of chi-squared with 1 degree of freedom is 3.8 at the 95% confidence level. We therefore cannot reject the hypothesis that the two nests have the same log-sum coefficient.
gcc
, ecc
and erc
in a nest, hpc
in a nest alone, and alternatives gc
, ec
and er
in a nest. Does this model seem better or worse than the model in exercise 1, which puts alternative hpc
in the same nest as alternatives gcc
, ecc
and erc
?nl4 <- update(nl, nests=list(n1 = c('gcc', 'ecc', 'erc'), n2 = c('hpc'),
n3 = c('gc', 'ec', 'er')))
summary(nl4)
##
## Call:
## mlogit(formula = depvar ~ ich + och + icca + occa + inc.room +
## inc.cooling + int.cooling | 0, data = HC, nests = list(n1 = c("gcc",
## "ecc", "erc"), n2 = c("hpc"), n3 = c("gc", "ec", "er")),
## un.nest.el = TRUE)
##
## Frequencies of alternatives:choice
## ec ecc er erc gc gcc hpc
## 0.004 0.016 0.032 0.004 0.096 0.744 0.104
##
## bfgs method
## 8 iterations, 0h:0m:0s
## g'(-H)^-1g = 3.71E-08
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## ich -0.838394 0.100546 -8.3384 < 2.2e-16 ***
## och -1.331598 0.252069 -5.2827 1.273e-07 ***
## icca -0.256131 0.145564 -1.7596 0.07848 .
## occa -1.405656 1.207281 -1.1643 0.24430
## inc.room -0.571352 0.077950 -7.3297 2.307e-13 ***
## inc.cooling 0.311355 0.056357 5.5247 3.301e-08 ***
## int.cooling -10.413384 5.612445 -1.8554 0.06354 .
## iv 0.956544 0.180722 5.2929 1.204e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -180.26
The \(\ln L\) for this model is \(-180.26\), which is lower (more negative) than for the model with two nests, which got \(-178.12\).