We have data on the mode choice of 453 commuters. Four modes are available: (1) bus, (2) car alone, (3) carpool, and (4) rail. We have data for each commuter on the cost and time on each mode and the chosen mode.
mlogit
estimates the multinomial probit model if the probit
argument is TRUE
using the GHK procedure. This program estimates the full covariance matrix subject to normalization.
More precisely, utility differences are computed respective to the reference level of the response (by default the bus alternative) and the 3 \(\times\) 3 matrix of covariance is estimated. As the scale of utility is unobserved, the first element of the matrix is further set to 1. The Choleski factor of the covariance is :
\[ L = \left( \begin{array}{ccc} 1 & 0 & 0 \\ \theta_{32} & \theta_{33} & 0 \\ \theta_{42} & \theta_{43} & \theta_{44} \end{array} \right) \]
that is, five covariance parameters are estimated. The covariance matrix of the utility differences is then \(\Omega = L L^{\top}\). By working in a Choleski factor that has this form, the normalization constraints are automatically imposed and the covariance matrix is guaranteed to be positive semi-definite (with the covariance matrix for any error differences being positive-definite).
library("mlogit")
data("Mode", package="mlogit")
Mo <- dfidx(Mode, choice = "choice", varying = 2:9)
p1 <- mlogit(choice ~ cost + time, Mo, seed = 20,
R = 100, probit = TRUE)
summary(p1)
##
## Call:
## mlogit(formula = choice ~ cost + time, data = Mo, probit = TRUE,
## R = 100, seed = 20)
##
## Frequencies of alternatives:choice
## bus car carpool rail
## 0.17881 0.48124 0.07064 0.26932
##
## bfgs method
## 20 iterations, 0h:0m:18s
## g'(-H)^-1g = 7.71E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept):car 1.8308660 0.2506434 7.3047 2.780e-13 ***
## (Intercept):carpool -1.2816819 0.5677813 -2.2574 0.0239861 *
## (Intercept):rail 0.3093510 0.1151701 2.6860 0.0072305 **
## cost -0.4134401 0.0731593 -5.6512 1.593e-08 ***
## time -0.0466552 0.0068263 -6.8347 8.220e-12 ***
## car.carpool 0.2599724 0.3850288 0.6752 0.4995472
## car.rail 0.7364869 0.2145744 3.4323 0.0005985 ***
## carpool.carpool 1.3078947 0.3916729 3.3393 0.0008400 ***
## carpool.rail -0.7981842 0.3463735 -2.3044 0.0212000 *
## rail.rail 0.4301303 0.4874624 0.8824 0.3775677
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -347.92
## McFadden R^2: 0.36012
## Likelihood ratio test : chisq = 391.62 (p.value = < 2.22e-16)
The estimated Choleski factor \(L_1\) is :
L1 <- matrix(0, 3, 3)
L1[! upper.tri(L1)] <- c(1, coef(p1)[6:10])
Multiplying L1 by its transpose gives \(\Omega_1\) :
L1 %*% t(L1)
## [,1] [,2] [,3]
## [1,] 1.0000000 0.2599724 0.7364869
## [2,] 0.2599724 1.7781743 -0.8524746
## [3,] 0.7364869 -0.8524746 1.3645231
With iid errors, \(\Omega_1\) would be :
\[ \left( \begin{array}{ccc} 1 & 0.5 & 0.5 \\ 0.5 & 1 & 0.5 \\ 0.5 & 0.5 & 1 \\ \end{array} \right) \]
I find it hard to tell anything from the estimated covariance terms.
I agree: it is hard -- if not impossible -- to meaningfully interpret the covariance parameters when all free parameters are estimated. However, the substitutiuon patterns that the estimates imply can be observed by forecasting with the model; we do this in exercise 4 below. Also, when structure is placed on the covariance matrix, the estimates are usually easier to interpret; this is explored in exercise 6.]
p2 <- mlogit(choice ~ cost + time, Mo, seed = 21,
R = 100, probit = TRUE)
coef(p2)
## (Intercept):car (Intercept):carpool (Intercept):rail
## 1.87149948 -1.28893595 0.31455318
## cost time car.carpool
## -0.43068703 -0.04752315 0.22888163
## car.rail carpool.carpool carpool.rail
## 0.69781113 1.33071717 -0.56802431
## rail.rail
## 0.71060138
The estimates seem to change more for parameters with larger standard error, though this is not uniformly the case by any means. One would expect larger samplign variance (which arises from a flatter \(\ln L\) near the max) to translate into greater simulation variance (which raises when the location of the max changes with different draws).
The actual shares are :
actShares <- tapply(Mo$choice, Mo$id2, mean)
The predicted shares are :
predShares <- apply(fitted(p1, outcome = FALSE), 2, mean)
rbind(predShares, actShares)
## bus car carpool rail
## predShares 0.1759623 0.4823958 0.06923727 0.2720977
## actShares 0.1788079 0.4812362 0.07064018 0.2693157
sum(predShares)
## [1] 0.9996931
The correspondence is very close but not exact.
Note: Simulated GHK probabilities do not necessarily sum to one over alternatives. This summing-up error at the individual level tends to cancel out when the probabilities are averaged over the sample. The forecasted shares (ie, average probabilities) sum to 0.9991102, which is only slightly different from 1.
Mo2 <- Mo
Mo2[idx(Mo2, 2) == 'car', 'cost'] <- Mo2[idx(Mo2, 2) == 'car', 'cost'] * 2
newShares <- apply(predict(p1, newdata = Mo2), 2, mean)
cbind(original = actShares, new = newShares,
change = round((newShares - actShares) / actShares * 100))
## original new change
## bus 0.17880795 0.2517897 41
## car 0.48123620 0.1689331 -65
## carpool 0.07064018 0.1432554 103
## rail 0.26931567 0.4356097 62
Substitution is not proportional. Carpool gets the largest percent increase.