Extract the individual-algorithm extrinsic importance from each fitted algorithm within the Super Learner; compute the average weighted rank of the importance scores, with weights specified by each algorithm's weight in the Super Learner.

extract_importance_SL(fit, feature_names, import_type = "all", ...)

Arguments

fit

the fitted Super Learner ensemble

feature_names

the names of the features

import_type

the level of granularity for importance: "all" is the importance based on the weighted average of ranks across algorithmrithms (weights are SL coefs); "best" is the importance based on the algorithmrithm with highest weight. Defaults to "all".

...

other arguments to pass to individual-algorithm extractors.

Value

a tibble, with columns feature (the feature) and rank (the weighted feature importance rank, with 1 indicating the most important feature).

Examples

data("biomarkers")
# subset to complete cases for illustration
cc <- complete.cases(biomarkers)
dat_cc <- biomarkers[cc, ]
# use only the mucinous outcome, not the high-malignancy outcome
y <- dat_cc$mucinous
x <- dat_cc[, !(names(dat_cc) %in% c("mucinous", "high_malignancy"))]
feature_nms <- names(x)
# get the fit (using a simple library and 2 folds for illustration only)
set.seed(20231129)
library("SuperLearner")
#> Loading required package: nnls
#> Loading required package: gam
#> Loading required package: splines
#> Loading required package: foreach
#> Loaded gam 1.22-2
#> Super Learner
#> Version: 2.0-28.1
#> Package created on 2021-05-04
fit <- SuperLearner::SuperLearner(Y = y, X = x, SL.library = c("SL.glm", "SL.mean"), 
                                  cvControl = list(V = 2))
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
# extract importance using all learners
importance <- extract_importance_SL(fit = fit, feature_names = feature_nms)
importance
#> # A tibble: 22 × 2
#>    feature                          rank
#>    <chr>                           <dbl>
#>  1 cea                              11.5
#>  2 cea_call                         11.5
#>  3 institution                      11.5
#>  4 lab1_actb                        11.5
#>  5 lab1_molecules_neoplasia_call    11.5
#>  6 lab1_molecules_score             11.5
#>  7 lab1_telomerase_neoplasia_call   11.5
#>  8 lab1_telomerase_score            11.5
#>  9 lab2_fluorescence_mucinous_call  11.5
#> 10 lab2_fluorescence_score          11.5
#> # ℹ 12 more rows
# extract importance of best learner
best_importance <- extract_importance_SL(fit = fit, feature_names = feature_nms, 
                                         import_type = "best")
best_importance
#> # A tibble: 22 × 2
#>    feature                  rank
#>    <chr>                   <dbl>
#>  1 institution              11.5
#>  2 lab1_actb                11.5
#>  3 lab1_molecules_score     11.5
#>  4 lab1_telomerase_score    11.5
#>  5 lab2_fluorescence_score  11.5
#>  6 lab3_muc3ac_score        11.5
#>  7 lab3_muc5ac_score        11.5
#>  8 lab4_areg_score          11.5
#>  9 lab4_glucose_score       11.5
#> 10 lab5_mucinous_call       11.5
#> # ℹ 12 more rows