Compute estimates of and confidence intervals for nonparametric intrinsic variable importance based on the population-level contrast between the oracle predictiveness using the feature(s) of interest versus not.

```
vim(
Y = NULL,
X = NULL,
f1 = NULL,
f2 = NULL,
indx = 1,
type = "r_squared",
run_regression = TRUE,
SL.library = c("SL.glmnet", "SL.xgboost", "SL.mean"),
alpha = 0.05,
delta = 0,
scale = "identity",
na.rm = FALSE,
sample_splitting = TRUE,
sample_splitting_folds = NULL,
final_point_estimate = "split",
stratified = FALSE,
C = rep(1, length(Y)),
Z = NULL,
ipc_scale = "identity",
ipc_weights = rep(1, length(Y)),
ipc_est_type = "aipw",
scale_est = TRUE,
nuisance_estimators_full = NULL,
nuisance_estimators_reduced = NULL,
exposure_name = NULL,
bootstrap = FALSE,
b = 1000,
boot_interval_type = "perc",
clustered = FALSE,
cluster_id = rep(NA, length(Y)),
...
)
```

- Y
the outcome.

- X
the covariates. If

`type = "average_value"`

, then the exposure variable should be part of`X`

, with its name provided in`exposure_name`

.- f1
the fitted values from a flexible estimation technique regressing Y on X. A vector of the same length as

`Y`

; if sample-splitting is desired, then the value of`f1`

at each position should be the result of predicting from a model trained without that observation.- f2
the fitted values from a flexible estimation technique regressing either (a)

`f1`

or (b) Y on X withholding the columns in`indx`

. A vector of the same length as`Y`

; if sample-splitting is desired, then the value of`f2`

at each position should be the result of predicting from a model trained without that observation.- indx
the indices of the covariate(s) to calculate variable importance for; defaults to 1.

- type
the type of importance to compute; defaults to

`r_squared`

, but other supported options are`auc`

,`accuracy`

,`deviance`

, and`anova`

.- run_regression
if outcome Y and covariates X are passed to

`vimp_accuracy`

, and`run_regression`

is`TRUE`

, then Super Learner will be used; otherwise, variable importance will be computed using the inputted fitted values.- SL.library
a character vector of learners to pass to

`SuperLearner`

, if`f1`

and`f2`

are Y and X, respectively. Defaults to`SL.glmnet`

,`SL.xgboost`

, and`SL.mean`

.- alpha
the level to compute the confidence interval at. Defaults to 0.05, corresponding to a 95% confidence interval.

- delta
the value of the \(\delta\)-null (i.e., testing if importance < \(\delta\)); defaults to 0.

- scale
should CIs be computed on original ("identity") or another scale? (options are "log" and "logit")

- na.rm
should we remove NAs in the outcome and fitted values in computation? (defaults to

`FALSE`

)- sample_splitting
should we use sample-splitting to estimate the full and reduced predictiveness? Defaults to

`TRUE`

, since inferences made using`sample_splitting = FALSE`

will be invalid for variables with truly zero importance.- sample_splitting_folds
the folds used for sample-splitting; these identify the observations that should be used to evaluate predictiveness based on the full and reduced sets of covariates, respectively. Only used if

`run_regression = FALSE`

.- final_point_estimate
if sample splitting is used, should the final point estimates be based on only the sample-split folds used for inference (

`"split"`

, the default), or should they instead be based on the full dataset (`"full"`

) or the average across the point estimates from each sample split (`"average"`

)? All three options result in valid point estimates -- sample-splitting is only required for valid inference.- stratified
if run_regression = TRUE, then should the generated folds be stratified based on the outcome (helps to ensure class balance across cross-validation folds)

- C
the indicator of coarsening (1 denotes observed, 0 denotes unobserved).

- Z
either (i) NULL (the default, in which case the argument

`C`

above must be all ones), or (ii) a character vector specifying the variable(s) among Y and X that are thought to play a role in the coarsening mechanism. To specify the outcome, use`"Y"`

; to specify covariates, use a character number corresponding to the desired position in X (e.g.,`"1"`

).- ipc_scale
what scale should the inverse probability weight correction be applied on (if any)? Defaults to "identity". (other options are "log" and "logit")

- ipc_weights
weights for the computed influence curve (i.e., inverse probability weights for coarsened-at-random settings). Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]).

- ipc_est_type
the type of procedure used for coarsened-at-random settings; options are "ipw" (for inverse probability weighting) or "aipw" (for augmented inverse probability weighting). Only used if

`C`

is not all equal to 1.- scale_est
should the point estimate be scaled to be greater than or equal to 0? Defaults to

`TRUE`

.- nuisance_estimators_full
(only used if

`type = "average_value"`

) a list of nuisance function estimators on the observed data (may be within a specified fold, for cross-fitted estimates). Specifically: an estimator of the optimal treatment rule; an estimator of the propensity score under the estimated optimal treatment rule; and an estimator of the outcome regression when treatment is assigned according to the estimated optimal rule.- nuisance_estimators_reduced
(only used if

`type = "average_value"`

) a list of nuisance function estimators on the observed data (may be within a specified fold, for cross-fitted estimates). Specifically: an estimator of the optimal treatment rule; an estimator of the propensity score under the estimated optimal treatment rule; and an estimator of the outcome regression when treatment is assigned according to the estimated optimal rule.- exposure_name
(only used if

`type = "average_value"`

) the name of the exposure of interest; binary, with 1 indicating presence of the exposure and 0 indicating absence of the exposure.- bootstrap
should bootstrap-based standard error estimates be computed? Defaults to

`FALSE`

(and currently may only be used if`sample_splitting = FALSE`

).- b
the number of bootstrap replicates (only used if

`bootstrap = TRUE`

and`sample_splitting = FALSE`

); defaults to 1000.- boot_interval_type
the type of bootstrap interval (one of

`"norm"`

,`"basic"`

,`"stud"`

,`"perc"`

, or`"bca"`

, as in`boot{boot.ci}`

) if requested. Defaults to`"perc"`

.- clustered
should the bootstrap resamples be performed on clusters rather than individual observations? Defaults to

`FALSE`

.- cluster_id
vector of the same length as

`Y`

giving the cluster IDs used for the clustered bootstrap, if`clustered`

is`TRUE`

.- ...
other arguments to the estimation tool, see "See also".

An object of classes `vim`

and the type of risk-based measure.
See Details for more information.

We define the population variable importance measure (VIM) for the group of features (or single feature) \(s\) with respect to the predictiveness measure \(V\) by $$\psi_{0,s} := V(f_0, P_0) - V(f_{0,s}, P_0),$$ where \(f_0\) is the population predictiveness maximizing function, \(f_{0,s}\) is the population predictiveness maximizing function that is only allowed to access the features with index not in \(s\), and \(P_0\) is the true data-generating distribution. VIM estimates are obtained by obtaining estimators \(f_n\) and \(f_{n,s}\) of \(f_0\) and \(f_{0,s}\), respectively; obtaining an estimator \(P_n\) of \(P_0\); and finally, setting \(\psi_{n,s} := V(f_n, P_n) - V(f_{n,s}, P_n)\).

In the interest of transparency, we return most of the calculations
within the `vim`

object. This results in a list including:

- s
the column(s) to calculate variable importance for

- SL.library
the library of learners passed to

`SuperLearner`

- type
the type of risk-based variable importance measured

- full_fit
the fitted values of the chosen method fit to the full data

- red_fit
the fitted values of the chosen method fit to the reduced data

- est
the estimated variable importance

- naive
the naive estimator of variable importance (only used if

`type = "anova"`

)- eif
the estimated efficient influence function

- eif_full
the estimated efficient influence function for the full regression

- eif_reduced
the estimated efficient influence function for the reduced regression

- se
the standard error for the estimated variable importance

- ci
the \((1-\alpha) \times 100\)% confidence interval for the variable importance estimate

- test
a decision to either reject (TRUE) or not reject (FALSE) the null hypothesis, based on a conservative test

- p_value
a p-value based on the same test as

`test`

- full_mod
the object returned by the estimation procedure for the full data regression (if applicable)

- red_mod
the object returned by the estimation procedure for the reduced data regression (if applicable)

- alpha
the level, for confidence interval calculation

- sample_splitting_folds
the folds used for sample-splitting (used for hypothesis testing)

- y
the outcome

- ipc_weights
the weights

- cluster_id
the cluster IDs

- mat
a tibble with the estimate, SE, CI, hypothesis testing decision, and p-value

`SuperLearner`

for specific usage of the
`SuperLearner`

function and package.

```
# generate the data
# generate X
p <- 2
n <- 100
x <- data.frame(replicate(p, stats::runif(n, -1, 1)))
# apply the function to the x's
f <- function(x) 0.5 + 0.3*x[1] + 0.2*x[2]
smooth <- apply(x, 1, function(z) f(z))
# generate Y ~ Bernoulli (smooth)
y <- matrix(rbinom(n, size = 1, prob = smooth))
# set up a library for SuperLearner; note simple library for speed
library("SuperLearner")
learners <- c("SL.glm")
# using Y and X; use class-balanced folds
est_1 <- vim(y, x, indx = 2, type = "accuracy",
alpha = 0.05, run_regression = TRUE,
SL.library = learners, cvControl = list(V = 2),
stratified = TRUE)
# using pre-computed fitted values
set.seed(4747)
V <- 2
full_fit <- SuperLearner::CV.SuperLearner(Y = y, X = x,
SL.library = learners,
cvControl = list(V = 2),
innerCvControl = list(list(V = V)))
#> Warning: Only a single innerCvControl is given, will be replicated across all cross-validation split calls to SuperLearner
full_fitted <- SuperLearner::predict.SuperLearner(full_fit)$pred
# fit the data with only X1
reduced_fit <- SuperLearner::CV.SuperLearner(Y = full_fitted,
X = x[, -2, drop = FALSE],
SL.library = learners,
cvControl = list(V = 2, validRows = full_fit$folds),
innerCvControl = list(list(V = V)))
#> Warning: Only a single innerCvControl is given, will be replicated across all cross-validation split calls to SuperLearner
reduced_fitted <- SuperLearner::predict.SuperLearner(reduced_fit)$pred
est_2 <- vim(Y = y, f1 = full_fitted, f2 = reduced_fitted,
indx = 2, run_regression = FALSE, alpha = 0.05,
stratified = TRUE, type = "accuracy",
sample_splitting_folds = get_cv_sl_folds(full_fit$folds))
```