NEWS.md
S3
class, which makes internal code cleaner and facilitates simpler addition of new predictiveness measures.extract_sampled_split_predictions
is a vector, not a list. This facilitates proper use in the new version of the package.Z
in coarsened-data settings; allow case-insensitive specification of covariate names/positions when creating Z
V
defaults to 5 if no cross-fitting folds are specified externallycross_fitted_f1
and cross_fitted_f2
in cv_vim
cross_fitted_f1
and cross_fitted_f2
in cv_vim
cross_fitted_se
to cv_vim
and sp_vim
; this logical option allows the standard error to be estimated using cross-fitting. This can improve performance in cases where flexible algorithms are used to estimate the full and reduced regressions.vim
and cv_vim
; currently, this option is only available for non-sampled-split calls (i.e., with sample_splitting = FALSE
)vim
are based on the entire dataset, while the full and reduced predictiveness (predictiveness_full
and predictiveness_reduced
, along with the corresponding confidence intervals) is evaluated using separate portions of the data for the full and reduced regressions.sample_splitting
to vim
, cv_vim
and sp_vim
; if FALSE
, sample-splitting is not used to estimate predictiveness. Note that we recommend using the default, TRUE
, in all cases, since inference using sample_splitting = FALSE
will be invalid for variables with truly null variable importance.sample_splitting = TRUE
to match more closely with theoretical results (and improve power!). In this case, we first split the data into 2K cross-fitting folds, and split these folds equally into two sample-splitting folds. For the nuisance regression using all covariates, for each k ∈ {1, …, K} we set aside the data in sample-splitting fold 1 and cross-fitting fold k [this comprises 1/(2K) of the data]. We train using the remaining observations [comprising (2K−1)/(2K) of the data] not in this testing fold, and we test on the originally withheld data. We repeat for the nuisance regression using the reduced set of covariates, but withhold data in sample-splitting fold 2. This update affects both cv_vim
and sp_vim
. If sample_splitting = FALSE
, then we use standard cross-fitting.family
if it isn’t specified; use stats::binomial()
if there are only two unique outcome values, otherwise use stats::gaussian()
cvAUC
)cvAUC
ipc_est_type
(available in vim
, cv_vim
, and sp_vim
; also corresponding wrapper functions for each VIM and corresponding internal estimation functions)testthat/
to use glm
rather than xgboost
(increases speed)glm
rather than xgboost
or ranger
(increases speed, even though the regression is now misspecified for the truth)forcats
from vignettemeasure_accuracy
and measure_auc
for project-wide consistencytestthat/
to not explicitly load xgboost
stats::qlogis
and stats::plogis
rather than bespoke functionsvimp
will handle the rest.vimp
”run_regression = TRUE
for simplicityverbose
to sp_vim
; if TRUE
, messages are printed throughout fitting that display progress and verbose
is passed to SuperLearner
cv_predictiveness_point_est
and predictiveness_point_est
to est_predictiveness_cv
and est_predictiveness
, respectivelycv_predictiveness_update
, cv_vimp_point_est
, cv_vimp_update
, predictiveness_update
, vimp_point_est
, vimp_update
; this functionality is now in est_predictiveness_cv
and est_predictiveness
(for the *update*
functions) or directly in vim
or cv_vim
(for the *vimp*
functions)predictiveness_se
and predictiveness_ci
(functionality is now in vimp_se
and vimp_ci
, respectively)weights
argument to ipc_weights
, clarifying that these weights are meant to be used as inverse probability of coarsening (e.g., censoring) weightssp_vim
and helper functions run_sl
, sample_subsets
, spvim_ics
, spvim_se
; these functions allow computation of the Shapley Population Variable Importance Measure (SPVIM)cv_vim
and vim
now use an outer layer of sample splitting for hypothesis testingvimp_auc
, vimp_accuracy
, vimp_deviance
, vimp_rsquared
vimp_regression
is now deprecated; use vimp_anova
insteadvim
; each variable importance function is now a wrapper function around vim
with the type
argument filled incv_vim_nodonsker
is now deprecated; use cv_vim
insteadvimp_anova
)vimp_anova
)family
for the top-level SuperLearner if run_regression = TRUE
; in call cases, the second-stage SuperLearner uses a gaussian
familySL.mean
as the best-fitting algorithm, the second-stage regression is now run using the original outcome, rather than the first-stage fitted valuestwo_validation_set_cv
, which sets up folds for V-fold cross-validation with two validation sets per foldcv_vim
: now, the cross-validated naive estimator is computed on a first validation set, while the update for the corrected estimator is computed using the second validation set (both created from two_validation_set_cv
); this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator, while making sure that the initial CV naive estimator is not biased high (due to a higher R^2 on the training data)cv_vim
: now, the cross-validated naive estimator is computed on the training data for each fold, while the update for the corrected cross-validated estimator is computed using the test data; this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimatorvim
, replaced with individual-parameter functionsvimp_regression
to match Python packagecv_vim
now can compute regression estimatorsvimp_ci
, vimp_se
, vimp_update
, onestep_based_estimator