Introduction
lvimp is a package that computes nonparametric estimates
of summaries of a nonparametric variable importance trajectory over
time, and provides inference on the true summaries of the variable
importance trajectory. The package depends heavily on the vimp
package for estimating and doing inference on the cross-sectional
variable importance at each timepoint in the trajectory.
Installation
A development version of the package may be downloaded and installed
from GitHub using the remotes package:
pak::pkg_install("bdwilliamson/lvimp")Quick Start
This section should serve as a quick guide to using the
lvimp package. We will cover the three main functions for
estimating summaries of the longitudinal variable importance trajectory
using simulated data.
First, load the lvimp package:
Next, create some longitudinal data:
set.seed(4747)
p <- 2
n <- 5e4
T <- 3
timepoints <- seq_len(T) - 1
indices <- timepoints + 1
beta_01 <- rep(1, T)
beta_02 <- 1 + timepoints / 4
beta_0 <- lapply(as.list(seq_len(T)), function(t) {
matrix(c(beta_01[t], beta_02[t]))
})
# generate 2 covariates
x <- lapply(as.list(1:T), function(t) as.data.frame(replicate(p, stats::rnorm(n, 0, 1))))
# apply the function to the x's
y <- lapply(as.list(1:T), function(t) as.matrix(x[[t]]) %*% beta_0[[t]] + rnorm(n, 0, 1))In this scenario, there are three timepoints at which data are
collected. The above code block creates a list x containing
3 matrices, each with 2 columns and n rows; and a list
y containing three vectors of length n. Here,
x contains the covariates of interest and y
contains the outcomes of interest.
Next, we use the vimp package to estimate the importance
of variable 1 relative to variable 2 for predicting
at each timepoint:
library("vimp")
#> vimp version 2.3.6: Perform Inference on Algorithm-Agnostic Variable Importance
library("SuperLearner")
#> Loading required package: nnls
#> Loading required package: gam
#> Loading required package: splines
#> Loading required package: foreach
#> Loaded gam 1.22-6
#> Super Learner
#> Version: 2.0-29
#> Package created on 2024-02-06
set.seed(1234)
# in this case, glm is correctly specified (so only use one learner to speed things up)
vim_list_1 <- lapply(as.list(1:T), function(t) {
vimp::cv_vim(Y = y[[t]], X = x[[t]], indx = 1, V = 10, type = "r_squared",
SL.library = c("SL.glm"))
})Finally, there are three available summaries in lvimp: *
The average variable importance over a contiguous subset of the time
series (lvim_average) * The linear trend in variable
importance over a contiguous subset of the time series
(lvim_trend) * The area under the variable importance
trajectory curve over a contiguous subset of the time series
(lvim_autc)
We now estimate and do inference on these three summary measures:
# set up an lvim object
lvim_obj <- lvim(vim_list_1, timepoints = 1:3)
# obtain the average
est_lvim <- lvim_average(lvim_obj, indices = 1:3)
# add on the linear trend
est_lvim <- lvim_trend(est_lvim, indices = 1:3)
# add on the AUTC based on a piecewise linear trajectory
est_lvim <- lvim_autc(est_lvim, indices = 1:3)
# inspect the estimates
est_lvim
#> Variable importance estimates:
#> Timepoint Estimate SE 95% CI VIMP > 0
#> s = 1 1 0.345 0.00598 [0.334, 0.357] TRUE
#> s = 1 2 0.284 0.00561 [0.273, 0.295] TRUE
#> s = 1 3 0.238 0.00507 [0.228, 0.248] TRUE
#> Average 0.289 0.00321 [0.283, 0.295] <NA>
#> Linear trend: intercept 0.3961 0.00886 [ 0.3788, 0.4135] <NA>
#> Linear trend: slope -0.0536 0.00392 [-0.0612, -0.0459] <NA>
#> AUTC 0.575 0.00685 [0.562, 0.589] <NA>
#> p-value
#> s = 1 0
#> s = 1 0
#> s = 1 0
#> 0
#> <NA>
#> 1.39e-42
#> 0