Average the output from multiple calls to vimp_regression, for different independent groups, into a single estimate with a corresponding standard error and confidence interval.

average_vim(..., weights = rep(1/length(list(...)), length(list(...))))

Arguments

...

an arbitrary number of vim objects.

weights

how to average the vims together, and must sum to 1; defaults to 1/(number of vims) for each vim, corresponding to the arithmetic mean

Value

an object of class vim containing the (weighted) average of the individual importance estimates, as well as the appropriate standard error and confidence interval. This results in a list containing:

  • s - a list of the column(s) to calculate variable importance for

  • SL.library - a list of the libraries of learners passed to SuperLearner

  • full_fit - a list of the fitted values of the chosen method fit to the full data

  • red_fit - a list of the fitted values of the chosen method fit to the reduced data

  • est- a vector with the corrected estimates

  • naive- a vector with the naive estimates

  • update- a list with the influence curve-based updates

  • mat - a matrix with the estimated variable importance, the standard error, and the \((1-\alpha) \times 100\)% confidence interval

  • full_mod - a list of the objects returned by the estimation procedure for the full data regression (if applicable)

  • red_mod - a list of the objects returned by the estimation procedure for the reduced data regression (if applicable)

  • alpha - the level, for confidence interval calculation

  • y - a list of the outcomes

Examples

# generate the data
p <- 2
n <- 100
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))

# apply the function to the x's
smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2

# generate Y ~ Normal (smooth, 1)
y <- smooth + stats::rnorm(n, 0, 1)

# set up a library for SuperLearner; note simple library for speed
library("SuperLearner")
#> Loading required package: nnls
#> Loading required package: gam
#> Loading required package: splines
#> Loading required package: foreach
#> Loaded gam 1.22-3
#> Super Learner
#> Version: 2.0-28.1
#> Package created on 2021-05-04
learners <- c("SL.glm", "SL.mean")

# get estimates on independent splits of the data
samp <- sample(1:n, n/2, replace = FALSE)

# using Super Learner (with a small number of folds, for illustration only)
est_2 <- vimp_regression(Y = y[samp], X = x[samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))
#> Warning: vimp_anova now performs all functionality of vimp_regression; please update any code to reflect this change!
#> Hypothesis testing is not available for type = 'anova'. If you want an R-squared-based hypothesis test, please enter type = 'r_squared'.
#> Warning: Original estimate < 0; returning zero.

est_1 <- vimp_regression(Y = y[-samp], X = x[-samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))
#> Warning: vimp_anova now performs all functionality of vimp_regression; please update any code to reflect this change!
#> Hypothesis testing is not available for type = 'anova'. If you want an R-squared-based hypothesis test, please enter type = 'r_squared'.

ests <- average_vim(est_1, est_2, weights = c(1/2, 1/2))