ktweedie-vignette

Introduction

ktweedie is a package that fits nonparametric Tweedie compound Poisson gamma models in the reproducing kernel Hilbert space. The package is based on two algorithms, the ktweedie for kernel-based Tweedie model and the sktweedie for sparse kernel-based Tweedie model. The ktweedie supports a wide range of kernel functions implemented in the R package kernlab and the optimization algorithm is a Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm with bisection line search. The package includes cross-validation functions for one-dimensional tuning of the kernel regularization parameter \(\lambda\) and for two-dimensional joint tuning of one kernel parameter and \(\lambda\). The sktweedie uses variable weights to achieve variable selection. It is a meta-algorithm that alternatively updates the kernel parameters and a set of variable weights.

The ktweedie solves the problem \[ \min_{\boldsymbol{\alpha}}\left\{ -\sum_{i=1}^{n}\frac{1}{\phi}\left(\frac{y_{i}e^{(1-\rho)\mathbf{K}_{i}^{\top}\boldsymbol{\alpha}}}{1-\rho}-\frac{e^{(2-\rho)\mathbf{K}_{i}^{\top}\boldsymbol{\alpha}}}{2-\rho}\right)+\lambda\boldsymbol{\alpha}^{\top}\mathbf{K}\boldsymbol{\alpha}\right\} , \]where \(\rho\in(1,2)\) is the index parameter, \(\phi>0\) is the dispersion parameter, \(\mathbf{K}\) is an \(n\times n\) kernel matrix computed according to the user-specified kernel function \(K(\cdot ,\cdot)\), whose entries are \(K_{ij}=K(\mathbf{x}_i, \mathbf{x}_j)\) are calculated based on the \(p\)-dimensional predictors from subjects \(i,j=1,\ldots,n\). In the kernel-based Tweedie model, the mean parameter \(\mu_i\) for the \(i\)-th observation is modelled by \[\mu_i=\log(\mathbb{E}(Y_i|\mathbf{x_i}))=\sum_{j=1}^n \alpha_j K(\mathbf x_{i}, \mathbf x_j).\]

The sktweedie solves\[ \begin{aligned} &\min_{\boldsymbol{\alpha}, \mathbf{w}}\left\{ -\sum_{i=1}^{n}\frac{1}{\phi}\left(\frac{y_{i}e^{(1-\rho)\mathbf{K(w)}_{i}^{\top}\boldsymbol{\alpha}}}{1-\rho}-\frac{e^{(2-\rho)\mathbf{K(w)}_{i}^{\top}\boldsymbol{\alpha}}}{2-\rho}\right)+\lambda_1\boldsymbol{\alpha}^{\top}\mathbf{K(w)}\boldsymbol{\alpha} +\lambda_2 \mathbf{1}^\top \mathbf{w} \right \}\\ & \qquad \qquad \mathrm{s.t.\ \ \ } w_j\in [0,1],\ j=1,\ldots,p, \end{aligned}\]

where \(K(\mathbf{w})_{ij}=K(\mathbf{w \odot x}_i, \mathbf{w \odot x}_j)\) involves variable weights \(\mathbf w\).

Installation

From the CRAN.

install.packages("ktweedie")

From the Github.

devtools::install_github("ly129/ktweedie")

Quick Start

First we load the ktweedie package:

library(ktweedie)

The package includes a toy data for demonstration purpose. The \(30\times5\) predictor matrix x is generated from standard normal distribution and y is generated according to\[y\sim \mathrm{Tweedie}(\mu=\sin(x\beta), \rho=1.5,\phi=0.5),\]where \(\beta=(6, -4, 0, 0, 0)\). That said, only the first two predictors are associated with the response.

data(dat)
x <- dat$x
y <- dat$y

An input matrix x and an output vector y are now loaded. The ktd_estimate() function can be used to fit a basic ktweedie model. The regularization parameter lam1 can be a vector, which will be solved in a decreasing order with warm start.

fit.ktd <- ktd_estimate(x = x,
                        y = y,
                        kern = rbfdot(sigma = 0.1),
                        lam1 = c(0.01, 0.1, 1))
str(fit.ktd$estimates)
#> List of 3
#>  $ lambda 1   :List of 3
#>   ..$ fn         : num 110
#>   ..$ coefficient: num [1:30, 1] 0.5558 -0.062 -0.0381 0.0523 -0.0251 ...
#>   ..$ convergence: int 0
#>  $ lambda 0.1 :List of 3
#>   ..$ fn         : num 51
#>   ..$ coefficient: num [1:30, 1] 1.662 -0.235 -0.177 0.867 -0.143 ...
#>   ..$ convergence: int 0
#>  $ lambda 0.01:List of 3
#>   ..$ fn         : num 39.2
#>   ..$ coefficient: num [1:30, 1] 7.692 -0.49 -0.841 4.624 -0.696 ...
#>   ..$ convergence: int 0

fit.ktd$estimates stores the estimated coefficients and the final objective function value. The estimated kernel-based model coefficients for the \(l\)-th lam1 can be accessed by the index l: fit.ktd$estimates[[l]]$coefficient.

The function can also be used to fit the sktweedie model by setting the argument sparsity to TRUE, and specifying the regularization coefficient \(\lambda_2\) in the argument lam2.

fit.sktd <- ktd_estimate(x = x,
                         y = y,
                         kern = rbfdot(sigma = 0.1),
                         lam1 = 5,
                         sparsity = TRUE,
                         lam2 = 1)

And we can access the fitted coefficients in a similar manner to the fit.ktd. Additionally, the fitted variable weights \(\mathbf w\) can be accessed by

fit.sktd$estimates[[1]]$weight
#>           [,1]
#> [1,] 1.0000000
#> [2,] 0.4462078
#> [3,] 0.0000000
#> [4,] 0.0000000
#> [5,] 0.0000000

Variables with weights close to 0 can be viewed as noise variables.