Average multiple independent importance estimates

Average the output from multiple calls to vimp_regression, for different independent groups, into a single estimate with a corresponding standard error and confidence interval.

average_vim(..., weights = rep(1/length(list(...)), length(list(...))))

Arguments

...: an arbitrary number of vim objects.
weights: how to average the vims together, and must sum to 1; defaults to 1/(number of vims) for each vim, corresponding to the arithmetic mean

Value

an object of class vim containing the (weighted) average of the individual importance estimates, as well as the appropriate standard error and confidence interval. This results in a list containing:

s: - a list of the column(s) to calculate variable importance for
SL.library: - a list of the libraries of learners passed to SuperLearner
full_fit: - a list of the fitted values of the chosen method fit to the full data
red_fit: - a list of the fitted values of the chosen method fit to the reduced data
est: - a vector with the corrected estimates
naive: - a vector with the naive estimates
update: - a list with the influence curve-based updates
mat: - a matrix with the estimated variable importance, the standard error, and the \((1-\alpha) \times 100\)% confidence interval
full_mod: - a list of the objects returned by the estimation procedure for the full data regression (if applicable)
red_mod: - a list of the objects returned by the estimation procedure for the reduced data regression (if applicable)
alpha: - the level, for confidence interval calculation
y: - a list of the outcomes

Examples

# generate the data
p <- 2
n <- 100
x <- data.frame(replicate(p, stats::runif(n, -5, 5)))

# apply the function to the x's
smooth <- (x[,1]/5)^2*(x[,1]+7)/5 + (x[,2]/3)^2

# generate Y ~ Normal (smooth, 1)
y <- smooth + stats::rnorm(n, 0, 1)

# set up a library for SuperLearner; note simple library for speed
library("SuperLearner")
#> Loading required package: nnls
#> Loading required package: gam
#> Loading required package: splines
#> Loading required package: foreach
#> Loaded gam 1.22-5
#> Super Learner
#> Version: 2.0-29
#> Package created on 2024-02-06
learners <- c("SL.glm", "SL.mean")

# get estimates on independent splits of the data
samp <- sample(1:n, n/2, replace = FALSE)

# using Super Learner (with a small number of folds, for illustration only)
est_2 <- vimp_regression(Y = y[samp], X = x[samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))
#> Warning: vimp_anova now performs all functionality of vimp_regression; please update any code to reflect this change!
#> Hypothesis testing is not available for type = 'anova'. If you want an R-squared-based hypothesis test, please enter type = 'r_squared'.
#> Warning: Original estimate < 0; returning zero.

est_1 <- vimp_regression(Y = y[-samp], X = x[-samp, ], indx = 2, V = 2,
           run_regression = TRUE, alpha = 0.05,
           SL.library = learners, cvControl = list(V = 2))
#> Warning: vimp_anova now performs all functionality of vimp_regression; please update any code to reflect this change!
#> Hypothesis testing is not available for type = 'anova'. If you want an R-squared-based hypothesis test, please enter type = 'r_squared'.

ests <- average_vim(est_1, est_2, weights = c(1/2, 1/2))