Compute nonparametric estimates of the chosen measure of predictiveness.

est_predictiveness_cv(
  fitted_values,
  y,
  full_y = NULL,
  folds,
  type = "r_squared",
  C = rep(1, length(y)),
  Z = NULL,
  folds_Z = folds,
  ipc_weights = rep(1, length(C)),
  ipc_fit_type = "external",
  ipc_eif_preds = rep(1, length(C)),
  ipc_est_type = "aipw",
  scale = "identity",
  na.rm = FALSE,
  ...
)

Arguments

fitted_values

fitted values from a regression function using the observed data; a list of length V, where each object is a set of predictions on the validation data, or a vector of the same length as y.

y

the observed outcome.

full_y

the observed outcome (from the entire dataset, for cross-fitted estimates).

folds

the cross-validation folds for the observed data.

type

which parameter are you estimating (defaults to r_squared, for R-squared-based variable importance)?

C

the indicator of coarsening (1 denotes observed, 0 denotes unobserved).

Z

either NULL (if no coarsening) or a matrix-like object containing the fully observed data.

folds_Z

either the cross-validation folds for the observed data (no coarsening) or a vector of folds for the fully observed data Z.

ipc_weights

weights for inverse probability of coarsening (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]).

ipc_fit_type

if "external", then use ipc_eif_preds; if "SL", fit a SuperLearner to determine the correction to the efficient influence function.

ipc_eif_preds

if ipc_fit_type = "external", the fitted values from a regression of the full-data EIF on the fully observed covariates/outcome; otherwise, not used.

ipc_est_type

IPC correction, either "ipw" (for classical inverse probability weighting) or "aipw" (for augmented inverse probability weighting; the default).

scale

if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform).

na.rm

logical; should NA's be removed in computation? (defaults to FALSE)

...

other arguments to SuperLearner, if ipc_fit_type = "SL".

Value

The estimated measure of predictiveness.

Details

See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest. If sample-splitting is also requested (recommended, since in this case inferences will be valid even if the variable has zero true importance), then the prediction functions are trained as if \(2K\)-fold cross-validation were run, but are evaluated on only \(K\) sets (independent between the full and reduced nuisance regression).