Based on estimated SPVIM values, do variable selection using the specified error-controlling method.

  spvim_ests = NULL,
  sample_size = NULL,
  feature_names = "",
  alpha = 0.05,
  control = list(quantity = "gFWER", base_method = "Holm", fdr_method = NULL, q = NULL, k
    = NULL)



the estimated SPVIM values (an object of class vim, resulting from a call to vimp::sp_vim). Can also be a list of estimated SPVIMs, if multiple imputation was used to handle missing data; in this case, Rubin's rules will be used to combine the estimated SPVIMs, and then selection will be based on the combined SPVIMs.


the number of independent observations used to estimate the SPVIM values.


the names of the features (a character vector of length p (the total number of features)); only used if the fitted Super Learner ensemble was fit on a matrix rather than on a data.frame, tibble, etc.


the nominal generalized family-wise error rate, proportion of false positives, or false discovery rate level to control at (e.g., 0.05).


a list of parameters to control the variable selection process. Parameters include quantity, base_method, q, and k. See intrinsic_control for details.


a tibble with the estimated intrinsic variable importance, the corresponding variable importance ranks, and the selected variables.

See also

sp_vim for specific usage of the sp_vim function and the vimp package for estimating intrinsic variable importance.


# \donttest{
# subset to complete cases for illustration
cc <- complete.cases(biomarkers)
dat_cc <- biomarkers[cc, ]
# use only the mucinous outcome, not the high-malignancy outcome
y <- dat_cc$mucinous
x <- dat_cc[, !(names(dat_cc) %in% c("mucinous", "high_malignancy"))]
feature_nms <- names(x)
# estimate SPVIMs (using simple library and V = 2 for illustration only)
est <- vimp::sp_vim(Y = y, X = x, V = 2, type = "auc", SL.library = "SL.glm", 
                    cvControl = list(V = 2))
# do intrinsic selection
intrinsic_set <- intrinsic_selection(spvim_ests = est, sample_size = nrow(dat_cc), alpha = 0.2, 
                                     feature_names = feature_nms, 
                                     control = list(quantity = "gFWER", base_method = "Holm", 
                                                    k = 1))
#> # A tibble: 22 × 6
#>    feature                     est p_value adjusted_p_value  rank selected
#>    <chr>                     <dbl>   <dbl>            <dbl> <dbl> <lgl>   
#>  1 institution             0         0.500                1    16 FALSE   
#>  2 lab1_actb               0.102     0.267                1     1 TRUE    
#>  3 lab1_molecules_score    0.0547    0.369                1     3 FALSE   
#>  4 lab1_telomerase_score   0.0242    0.443                1     5 FALSE   
#>  5 lab2_fluorescence_score 0.0120    0.470                1     7 FALSE   
#>  6 lab3_muc3ac_score       0.00767   0.480                1     9 FALSE   
#>  7 lab3_muc5ac_score       0         0.500                1    16 FALSE   
#>  8 lab4_areg_score         0         0.500                1    16 FALSE   
#>  9 lab4_glucose_score      0         0.500                1    16 FALSE   
#> 10 lab5_mucinous_call      0         0.500                1    16 FALSE   
#> # ℹ 12 more rows
# }