This notebook provides a detailed overview over the
plasso
package and its two main functions
plasso
and cv.plasso
which were developed in
the course of Knaus (2022). This package is strongly oriented
around the glmnet
package and rests on its standard
function glmnet
in its very basis. Related theory and
algorithms are described in Friedman, Hastie, and
Tibshirani (2010).
The very latest version of the package can be installed from its Github page. For the
installation you will need the devtools
package. The latest
‘official’ version can be installed from CRAN using
`install.packages()’. We recommend the latter.
General dependencies are: glmnet
, Matrix
,
methods
, parallel
, doParallel
,
foreach
and iterators
.
library(devtools)
::install_github("stefan-1997/plasso")
devtools
install.packages("plasso")
Load plasso
using library()
.
library(plasso)
The package generally provides two functions plasso
and
cv.plasso
which are both built on top of the
glmnet
functionality. Specifically, a glmnet
object lives within both functions and also in their outputs (list item
lasso_full
).
The term plasso
refers to a Post-Lasso model which
estimates a least squares algorithm only for the active (i.e. non-zero)
coefficients of a previously estimated Lasso models. This follows the
idea that we want to do selection but without shrinkage.
The package comes with some simulated data representing the following DGP:
The covariates matrix \(X\) consists of 10 variables whose effect size one the target \(Y\) is defined by the vector \(\boldsymbol{\pi} = [1, -0.83, 0.67, -0.5, 0.33, -0.17, 0, ..., 0]'\) where the first six effect sizes decrease in absolute terms continuously from 1 to 0 and alternate in their sign. The true causal effect of all other covariates is 0. The variables in \(X\) follow a normal distribution with mean zero while the covariance matrix follows a Toeplitz matrix, which is characterized by having constant diagonals: \[ \boldsymbol{\Sigma} = \begin{bmatrix} 1 & 0.7 & 0.7^2 & ... & 0.7^{9} \\ 0.7 & 1 & 0.7 & ... & 0.7^{8} \\ 0.7^2 & 0.7 & 1 & ... & 0.7^{7} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0.7^{9} & 0.7^{8} & 0.7^{7} & ... & 1 \end{bmatrix} \]
The target \(\boldsymbol{y}\) is then a linear transformation of \(\boldsymbol{X}\) plus a vector of standard normal random variables. Each element of \(\boldsymbol{y}\) is given by: \[ y_i = \boldsymbol{X}_i \boldsymbol{\pi} + \varepsilon_i \] where \(\varepsilon_i \sim \mathcal{N}(0,4)\).
data(toeplitz)
= as.matrix(toeplitz[,1])
y = toeplitz[,-1] X
plasso
returns least squares estimates for all lambda
values of a standard glmnet
object for both a simple Lasso
and a Post-Lasso model.
= plasso::plasso(X,y) p
You can plot the coefficient paths for both the Post-Lasso model as well as the underlying ‘original’ Lasso model. This nicely illustrates the difference between the Lasso and Post-Lasso models where the latter is characterized by jumps in its coefficient paths every time a new variable enters the active set.
plot(p, lasso=FALSE, xvar="lambda")
plot(p, lasso=TRUE, xvar="lambda")
We can also have a look at which coefficients are active for a chosen lambda value. Here, the difference between Post-Lasso and Lasso becomes clearly visible. For the Lasso model, there is not only feature selection but shrinkage which results in the active coefficients being smaller than for the Post-Lasso model:
= coef(p, s=0.01)
coef_p
as.vector(coef_p$plasso)
## [1] 0.1438137 1.0187628 -0.6214926 0.4673645 -0.2300834 -0.3575276
## [7] 0.2180390 0.1180676 -0.2138268 0.1975462 -0.1047983
as.vector(coef_p$lasso)
## [1] 0.14498611 0.98729386 -0.56374511 0.40656768 -0.20023679 -0.33156564
## [7] 0.18985685 0.08930237 -0.16087044 0.13798825 -0.06639638
The cv.plasso
function uses cross-validation to
determine the performance of different values for the
lambda
penalty term for both models (Post-Lasso and Lasso).
The returned output of class cv.plasso
includes the mean
squared errors.
When applying the summary
method and setting the
default
parameter as FALSE, you can get some informative
output considering the optimal choice of lambda.
= plasso::cv.plasso(X,y,kf=5)
p.cv summary(p.cv, default=FALSE)
##
## Call:
## plasso::cv.plasso(x = X, y = y, kf = 5)
##
## Lasso:
## Minimum CV MSE Lasso: 15.22
## Lambda at minimum: 0.01858
## Active variables at minimum: (Intercept) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
## Post-Lasso:
## Minimum CV MSE Post-Lasso: 15.2
## Lambda at minimum: 0.2087
## Active variables at minimum: (Intercept) X1 X5
Using the plot
method extends the basic
glmnet
visualization by the cross-validated MSEs for the
Post-Lasso model.
plot(p.cv, legend_pos="left", legend_size=0.5)
We can use the following code to get the optimal lambda value (for the Post-Lasso model here) and the associated coefficients at that value of \(\lambda\).
$lambda_min_pl p.cv
## [1] 0.2087288
= coef(p.cv, S="optimal")
coef_pcv as.vector(coef_pcv$plasso)
## [1] 0.1410181 0.7663423 0.0000000 0.0000000 0.0000000 -0.3000942
## [7] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000