Paired combinatorial logit model

Koppelman and Wen (2000) proposed the paired combinatorial logit model, which is a nested logit model with nests composed by every combination of two alternatives. This model is obtained by using the following \(G\) function :

\[ G(y_1, y_2, ..., y_n)=\sum_{k=1}^{J-1}\sum_{l=k+1}^J\left(y_k^{1/\lambda_{kl}}+y_l^{1/\lambda_{kl}} \right)^{\lambda_{kl}} \]

The pcl model is consistent with random utility maximisation if \(0<\lambda_{kl}\leq 1\) and the multinomial logit results if \(\lambda_{kl}=1 \;\forall (k,l)\). The resulting probabilities are :

\[ P_l = \frac{\sum_{k\neq l}e^{V_l/\lambda_{lk}}\left(e^{V_k/\lambda_{lk}} + e^{V_l/\lambda_{lk}}\right)^{\lambda_{lk}-1}} {\sum_{k=1}^{J-1}\sum_{l=k+1}^{J}\left(e^{V_k/\lambda_{lk}} + e^{V_l/\lambda_{lk}}\right)^{\lambda_{lk}}} \]

which can be expressed as a sum of \(J-1\) product of a conditional probability of choosing the alternative and the marginal probability of choosing the nest :

\[ P_l=\sum_{k\neq l}P_{l\mid lk} P_{lk} \]

with :

\[ P_{l \mid lk} = \frac{e^{V_l/\lambda_{lk}}}{e^{V_k/\lambda_{lk}} + e^{V_l/\lambda_{lk}}} \] \[ P_{lk}= \frac{\left(e^{V_k/\lambda_{lk}} + e^{V_l/\lambda_{lk}}\right)^{\lambda_{lk}}}{\sum_{k=1}^{J-1}\sum_{l=k+1}^{J}\left(e^{V_k/\lambda_{lk}} + e^{V_l/\lambda_{lk}}\right)^{\lambda_{lk}}} \]

We reproduce the example used by Koppelman and Wen (2000) on the same subset of the ModeCanada than the one used by Bhat (1995). Three modes are considered and there are therefore three nests. The elasticity of the train-air nest is set to one. To estimate this model, one has to set the nests argument to "pcl". All the nests of two alternatives are then automatically created. The restriction on the nest elasticity for the train-air nest is performed by using the constPar argument.

library("mlogit")
data("ModeCanada", package = "mlogit")
busUsers <- with(ModeCanada, case[choice == 1 & alt == 'bus'])
Bhat <- subset(ModeCanada, ! case %in% busUsers & alt != 'bus' & noalt == 4)
Bhat$alt <- Bhat$alt[drop = TRUE]
Bhat <- dfidx(Bhat, idx = c("case", "alt"), choice = "choice", idnames = c("chid", "alt"))
pcl <- mlogit(choice ~ freq + cost + ivt + ovt, Bhat, reflevel = 'car',
              nests = 'pcl', constPar=c('iv:train.air'))
summary(pcl)

## 
## Call:
## mlogit(formula = choice ~ freq + cost + ivt + ovt, data = Bhat, 
##     reflevel = "car", nests = "pcl", constPar = c("iv:train.air"))
## 
## Frequencies of alternatives:choice
##     car   train     air 
## 0.45757 0.16721 0.37523 
## 
## bfgs method
## 16 iterations, 0h:0m:1s 
## g'(-H)^-1g = 2.08E-07 
## gradient close to zero 
## 
## Coefficients :
##                      Estimate  Std. Error  z-value  Pr(>|z|)    
## (Intercept):train  1.30439316  0.16544227   7.8843 3.109e-15 ***
## (Intercept):air    1.99012922  0.35570613   5.5949 2.208e-08 ***
## freq               0.06537827  0.00435688  15.0057 < 2.2e-16 ***
## cost              -0.02448565  0.00316570  -7.7347 1.044e-14 ***
## ivt               -0.00761538  0.00067374 -11.3032 < 2.2e-16 ***
## ovt               -0.03223993  0.00237097 -13.5978 < 2.2e-16 ***
## iv:car.train       0.42129039  0.08613435   4.8911 1.003e-06 ***
## iv:car.air         0.27123320  0.09061319   2.9933   0.00276 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1903
## McFadden R^2:  0.32927 
## Likelihood ratio test : chisq = 1868.3 (p.value = < 2.22e-16)

The rank-ordered logit model

Sometimes, in stated-preference surveys, the respondents are asked to give the full rank of their preference for all the alternative, and not only the prefered alternative. The relevant model for this kind of data is the rank-ordered logit model, which can be estimated as a standard multinomial logit model if the data is reshaped correctly

The ranking can be decomposed in a series of choices of the best alternative within a decreasing set of available alternatives. For example, with 4 alternatives, the probability that the ranking would be 3-1-4-2 can be writen as follow :

alternative 3 is in the first position, the probability is then \(\frac{e^{\beta^{\top}x_3}}{e^{\beta^{\top}x_1}+e^{\beta^{\top}x_2}+e^{\beta^{\top}x_3}+e^{\beta^{\top}x_4}}\),
alternative 1 is in second position, the relevant probability is the logit probability that 1 is the chosen alternative in the set of alternatives (1-2-4) : \(\frac{e^{\beta^{\top}x_1}}{e^{\beta^{\top}x_1}+e^{\beta^{\top}x_2}+e^{\beta^{\top}x_4}}\),
alternative 4 is in third position, the relevant probability is the logit probability that 4 is the chosen alternative in the set of alternatives (2-4) : \(\frac{e^{\beta^{\top}x_4}}{e^{\beta^{\top}x_2}+e^{\beta^{\top}x_4}}\),
the probability of the full ranking is then simply the product of these 3 probabilities.

This model can therefore simply be fitted as a multinomial logit model ; the ranking for one individual amoung J alternatives is writen as \(J-1\) choices among \(J, J-1, ..., 2\) alternatives.

The estimation of the rank-ordered logit model is illustrated using the Game data set (Fok, Paap, and Van Dijk 2012). Respondents are asked to rank 6 gaming platforms. The covariates are a dummy own which indicates whether a specific platform is curently owned, the age of the respondent (age) and the number of hours spent on gaming per week (hours). The data set is available in wide (game) and long (game2) format. In wide format, the consists on \(J\) columns which indicate the ranking of each alternative.

data("Game", package = "mlogit")
data("Game2", package = "mlogit")
head(Game,2)

##   ch.Xbox ch.PlayStation ch.PSPortable ch.GameCube ch.GameBoy
## 1       2              1             3           5          6
## 2       4              2             3           5          6
##   ch.PC own.Xbox own.PlayStation own.PSPortable own.GameCube
## 1     4        0               1              0            0
## 2     1        0               1              0            0
##   own.GameBoy own.PC age hours
## 1           0      1  33  2.00
## 2           0      1  19  3.25

head(Game2, 7)

##   age hours    platform ch own chid
## 1  33  2.00     GameBoy  6   0    1
## 2  33  2.00    GameCube  5   0    1
## 3  33  2.00          PC  4   1    1
## 4  33  2.00 PlayStation  1   1    1
## 5  33  2.00  PSPortable  3   0    1
## 6  33  2.00        Xbox  2   0    1
## 7  19  3.25     GameBoy  6   0    2

nrow(Game)

## [1] 91

nrow(Game2)

## [1] 546

Note that Game contains 91 rows (there are 91 individuals) and that Game2 contains 546 rows (\(91\) individuals \(times\) 6 alternatives)

To use dfidx, the ranked argument should TRUE:

G <- dfidx(Game, varying = 1:12, choice = "ch", ranked = TRUE, idnames = c("chid", "alt"))
G <- dfidx(Game2, choice = "ch", ranked = TRUE, idx = c("chid", "platform"),
           idnames = c("chid", "alt"))
head(G)

## ~~~~~~~
##  first 10 observations out of 1820 
## ~~~~~~~
##    age hours    ch own    idx
## 1   33     2 FALSE   0 1:eBoy
## 2   33     2 FALSE   0 1:Cube
## 3   33     2 FALSE   1   1:PC
## 4   33     2 FALSE   0 1:able
## 5   33     2  TRUE   1 1:tion
## 6   33     2 FALSE   0 1:Xbox
## 7   33     2 FALSE   0 2:eBoy
## 8   33     2 FALSE   0 2:Cube
## 9   33     2 FALSE   1   2:PC
## 10  33     2 FALSE   0 2:able
## 
## ~~~ indexes ~~~~
##    idx1 chid         alt
## 1     1    1     GameBoy
## 2     1    1    GameCube
## 3     1    1          PC
## 4     1    1  PSPortable
## 5     1    1 PlayStation
## 6     1    1        Xbox
## 7     2    1     GameBoy
## 8     2    1    GameCube
## 9     2    1          PC
## 10    2    1  PSPortable
## indexes:  1, 1, 2

nrow(G)

## [1] 1820

Note that the choice variable is now a logical variable and that the number of row is now 1820 (91 individuals \(\times (6+5+4+3+2)\) alternatives).

Using PC as the reference level, we can then reproduce the results of the original reference :

summary(mlogit(ch ~ own | hours + age, G, reflevel = "PC"))

## 
## Call:
## mlogit(formula = ch ~ own | hours + age, data = G, reflevel = "PC", 
##     method = "nr")
## 
## Frequencies of alternatives:choice
##          PC     GameBoy    GameCube  PSPortable PlayStation 
##     0.17363     0.13846     0.13407     0.17363     0.18462 
##        Xbox 
##     0.19560 
## 
## nr method
## 5 iterations, 0h:0m:0s 
## g'(-H)^-1g = 6.74E-06 
## successive function values within tolerance limits 
## 
## Coefficients :
##                          Estimate Std. Error z-value  Pr(>|z|)
## (Intercept):GameBoy      1.570379   1.600251  0.9813 0.3264288
## (Intercept):GameCube     1.404095   1.603483  0.8757 0.3812185
## (Intercept):PSPortable   2.583563   1.620778  1.5940 0.1109302
## (Intercept):PlayStation  2.278506   1.606986  1.4179 0.1562270
## (Intercept):Xbox         2.733774   1.536098  1.7797 0.0751272
## own                      0.963367   0.190396  5.0598 4.197e-07
## hours:GameBoy           -0.235611   0.052130 -4.5197 6.193e-06
## hours:GameCube          -0.187070   0.051021 -3.6665 0.0002459
## hours:PSPortable        -0.233688   0.049412 -4.7294 2.252e-06
## hours:PlayStation       -0.129196   0.044682 -2.8915 0.0038345
## hours:Xbox              -0.173006   0.045698 -3.7858 0.0001532
## age:GameBoy             -0.073587   0.078630 -0.9359 0.3493442
## age:GameCube            -0.067574   0.077631 -0.8704 0.3840547
## age:PSPortable          -0.088669   0.079421 -1.1164 0.2642304
## age:PlayStation         -0.067006   0.079365 -0.8443 0.3985154
## age:Xbox                -0.066659   0.075205 -0.8864 0.3754227
##                            
## (Intercept):GameBoy        
## (Intercept):GameCube       
## (Intercept):PSPortable     
## (Intercept):PlayStation    
## (Intercept):Xbox        .  
## own                     ***
## hours:GameBoy           ***
## hours:GameCube          ***
## hours:PSPortable        ***
## hours:PlayStation       ** 
## hours:Xbox              ***
## age:GameBoy                
## age:GameCube               
## age:PSPortable             
## age:PlayStation            
## age:Xbox                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -516.55
## McFadden R^2:  0.36299 
## Likelihood ratio test : chisq = 588.7 (p.value = < 2.22e-16)

Miscellaneous models

Paired combinatorial logit model

The rank-ordered logit model

Bibliography