R/est_predictiveness_cv.R
est_predictiveness_cv.Rd
Compute nonparametric estimates of the chosen measure of predictiveness.
fitted values from a regression function using the
observed data; a list of length V, where each object is a set of
predictions on the validation data, or a vector of the same length as y
.
the observed outcome.
the observed outcome (from the entire dataset, for cross-fitted estimates).
the cross-validation folds for the observed data.
which parameter are you estimating (defaults to r_squared
,
for R-squared-based variable importance)?
the indicator of coarsening (1 denotes observed, 0 denotes unobserved).
either NULL
(if no coarsening) or a matrix-like object
containing the fully observed data.
either the cross-validation folds for the observed data (no coarsening) or a vector of folds for the fully observed data Z.
weights for inverse probability of coarsening (e.g., inverse weights from a two-phase sample) weighted estimation. Assumed to be already inverted (i.e., ipc_weights = 1 / [estimated probability weights]).
if "external", then use ipc_eif_preds
; if "SL",
fit a SuperLearner to determine the correction to the efficient
influence function.
if ipc_fit_type = "external"
, the fitted values
from a regression of the full-data EIF on the fully observed
covariates/outcome; otherwise, not used.
IPC correction, either "ipw"
(for classical
inverse probability weighting) or "aipw"
(for augmented inverse
probability weighting; the default).
if doing an IPC correction, then the scale that the correction should be computed on (e.g., "identity"; or "logit" to logit-transform, apply the correction, and back-transform).
logical; should NA's be removed in computation?
(defaults to FALSE
)
other arguments to SuperLearner, if ipc_fit_type = "SL"
.
The estimated measure of predictiveness.
See the paper by Williamson, Gilbert, Simon, and Carone for more details on the mathematics behind this function and the definition of the parameter of interest. If sample-splitting is also requested (recommended, since in this case inferences will be valid even if the variable has zero true importance), then the prediction functions are trained as if \(2K\)-fold cross-validation were run, but are evaluated on only \(K\) sets (independent between the full and reduced nuisance regression).