R/intrinsic_selection.R
intrinsic_selection.Rd
Based on estimated SPVIM values, do variable selection using the specified error-controlling method.
intrinsic_selection(
spvim_ests = NULL,
sample_size = NULL,
feature_names = "",
alpha = 0.05,
control = list(quantity = "gFWER", base_method = "Holm", fdr_method = NULL, q = NULL, k
= NULL)
)
the estimated SPVIM values (an object of class vim
,
resulting from a call to vimp::sp_vim
). Can also be a list of
estimated SPVIMs, if multiple imputation was used to handle missing data; in
this case, Rubin's rules will be used to combine the estimated SPVIMs, and
then selection will be based on the combined SPVIMs.
the number of independent observations used to estimate the SPVIM values.
the names of the features (a character vector of
length p
(the total number of features)); only used if the
fitted Super Learner ensemble was fit on a matrix
rather than on a
data.frame
, tibble
, etc.
the nominal generalized family-wise error rate, proportion of false positives, or false discovery rate level to control at (e.g., 0.05).
a list of parameters to control the variable selection process.
Parameters include quantity
, base_method
, q
, and
k
. See intrinsic_control
for details.
a tibble with the estimated intrinsic variable importance, the corresponding variable importance ranks, and the selected variables.
sp_vim
for specific usage of
the sp_vim
function and the vimp
package for estimating
intrinsic variable importance.
# \donttest{
data("biomarkers")
# subset to complete cases for illustration
cc <- complete.cases(biomarkers)
dat_cc <- biomarkers[cc, ]
# use only the mucinous outcome, not the high-malignancy outcome
y <- dat_cc$mucinous
x <- dat_cc[, !(names(dat_cc) %in% c("mucinous", "high_malignancy"))]
feature_nms <- names(x)
# estimate SPVIMs (using simple library and V = 2 for illustration only)
set.seed(20231129)
library("SuperLearner")
est <- vimp::sp_vim(Y = y, X = x, V = 2, type = "auc", SL.library = "SL.glm",
cvControl = list(V = 2))
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: One or more original estimates < 0; returning zero for these indices.
# do intrinsic selection
intrinsic_set <- intrinsic_selection(spvim_ests = est, sample_size = nrow(dat_cc), alpha = 0.2,
feature_names = feature_nms,
control = list(quantity = "gFWER", base_method = "Holm",
k = 1))
intrinsic_set
#> # A tibble: 22 × 6
#> feature est p_value adjusted_p_value rank selected
#> <chr> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1 institution 0 0.500 1 16 FALSE
#> 2 lab1_actb 0.102 0.267 1 1 TRUE
#> 3 lab1_molecules_score 0.0547 0.369 1 3 FALSE
#> 4 lab1_telomerase_score 0.0242 0.443 1 5 FALSE
#> 5 lab2_fluorescence_score 0.0120 0.470 1 7 FALSE
#> 6 lab3_muc3ac_score 0.00767 0.480 1 9 FALSE
#> 7 lab3_muc5ac_score 0 0.500 1 16 FALSE
#> 8 lab4_areg_score 0 0.500 1 16 FALSE
#> 9 lab4_glucose_score 0 0.500 1 16 FALSE
#> 10 lab5_mucinous_call 0 0.500 1 16 FALSE
#> # ℹ 12 more rows
# }