Perform intrinsic, ensemble-based variable selection

Based on estimated SPVIM values, do variable selection using the specified error-controlling method.

intrinsic_selection(
  spvim_ests = NULL,
  sample_size = NULL,
  feature_names = "",
  alpha = 0.05,
  control = list(quantity = "gFWER", base_method = "Holm", fdr_method = NULL, q = NULL, k
    = NULL)
)

Arguments

spvim_ests: the estimated SPVIM values (an object of class vim, resulting from a call to vimp::sp_vim). Can also be a list of estimated SPVIMs, if multiple imputation was used to handle missing data; in this case, Rubin's rules will be used to combine the estimated SPVIMs, and then selection will be based on the combined SPVIMs.
sample_size: the number of independent observations used to estimate the SPVIM values.
feature_names: the names of the features (a character vector of length p (the total number of features)); only used if the fitted Super Learner ensemble was fit on a matrix rather than on a data.frame, tibble, etc.
alpha: the nominal generalized family-wise error rate, proportion of false positives, or false discovery rate level to control at (e.g., 0.05).
control: a list of parameters to control the variable selection process. Parameters include quantity, base_method, q, and k. See intrinsic_control for details.

Value

a tibble with the estimated intrinsic variable importance, the corresponding variable importance ranks, and the selected variables.

Examples

# \donttest{
data("biomarkers")
# subset to complete cases for illustration
cc <- complete.cases(biomarkers)
dat_cc <- biomarkers[cc, ]
# use only the mucinous outcome, not the high-malignancy outcome
y <- dat_cc$mucinous
x <- dat_cc[, !(names(dat_cc) %in% c("mucinous", "high_malignancy"))]
feature_nms <- names(x)
# estimate SPVIMs (using simple library and V = 2 for illustration only)
set.seed(20231129)
library("SuperLearner")
est <- vimp::sp_vim(Y = y, X = x, V = 2, type = "auc", SL.library = "SL.glm", 
                    cvControl = list(V = 2))
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: One or more original estimates < 0; returning zero for these indices.
# do intrinsic selection
intrinsic_set <- intrinsic_selection(spvim_ests = est, sample_size = nrow(dat_cc), alpha = 0.2, 
                                     feature_names = feature_nms, 
                                     control = list(quantity = "gFWER", base_method = "Holm", 
                                                    k = 1))
intrinsic_set
#> # A tibble: 22 × 6
#>    feature                     est p_value adjusted_p_value  rank selected
#>    <chr>                     <dbl>   <dbl>            <dbl> <dbl> <lgl>   
#>  1 institution             0         0.500                1    16 FALSE   
#>  2 lab1_actb               0.102     0.267                1     1 TRUE    
#>  3 lab1_molecules_score    0.0547    0.369                1     3 FALSE   
#>  4 lab1_telomerase_score   0.0242    0.443                1     5 FALSE   
#>  5 lab2_fluorescence_score 0.0120    0.470                1     7 FALSE   
#>  6 lab3_muc3ac_score       0.00767   0.480                1     9 FALSE   
#>  7 lab3_muc5ac_score       0         0.500                1    16 FALSE   
#>  8 lab4_areg_score         0         0.500                1    16 FALSE   
#>  9 lab4_glucose_score      0         0.500                1    16 FALSE   
#> 10 lab5_mucinous_call      0         0.500                1    16 FALSE   
#> # ℹ 12 more rows
# }

Perform intrinsic, ensemble-based variable selection

Arguments

Value

See also

Examples