R/extract_importance_SL.R
extract_importance_SL.Rd
Extract the individual-algorithm extrinsic importance from each fitted algorithm within the Super Learner; compute the average weighted rank of the importance scores, with weights specified by each algorithm's weight in the Super Learner.
extract_importance_SL(fit, feature_names, import_type = "all", ...)
the fitted Super Learner ensemble
the names of the features
the level of granularity for importance: "all"
is the
importance based on the weighted average of ranks across algorithmrithms
(weights are SL coefs); "best"
is the importance based on the algorithmrithm
with highest weight. Defaults to "all"
.
other arguments to pass to individual-algorithm extractors.
a tibble, with columns feature
(the feature) and
rank
(the weighted feature importance rank, with 1 indicating the
most important feature).
data("biomarkers")
# subset to complete cases for illustration
cc <- complete.cases(biomarkers)
dat_cc <- biomarkers[cc, ]
# use only the mucinous outcome, not the high-malignancy outcome
y <- dat_cc$mucinous
x <- dat_cc[, !(names(dat_cc) %in% c("mucinous", "high_malignancy"))]
feature_nms <- names(x)
# get the fit (using a simple library and 2 folds for illustration only)
set.seed(20231129)
library("SuperLearner")
#> Loading required package: nnls
#> Loading required package: gam
#> Loading required package: splines
#> Loading required package: foreach
#> Loaded gam 1.22-2
#> Super Learner
#> Version: 2.0-28.1
#> Package created on 2021-05-04
fit <- SuperLearner::SuperLearner(Y = y, X = x, SL.library = c("SL.glm", "SL.mean"),
cvControl = list(V = 2))
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
#> Warning: prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases
# extract importance using all learners
importance <- extract_importance_SL(fit = fit, feature_names = feature_nms)
importance
#> # A tibble: 22 × 2
#> feature rank
#> <chr> <dbl>
#> 1 cea 11.5
#> 2 cea_call 11.5
#> 3 institution 11.5
#> 4 lab1_actb 11.5
#> 5 lab1_molecules_neoplasia_call 11.5
#> 6 lab1_molecules_score 11.5
#> 7 lab1_telomerase_neoplasia_call 11.5
#> 8 lab1_telomerase_score 11.5
#> 9 lab2_fluorescence_mucinous_call 11.5
#> 10 lab2_fluorescence_score 11.5
#> # ℹ 12 more rows
# extract importance of best learner
best_importance <- extract_importance_SL(fit = fit, feature_names = feature_nms,
import_type = "best")
best_importance
#> # A tibble: 22 × 2
#> feature rank
#> <chr> <dbl>
#> 1 institution 11.5
#> 2 lab1_actb 11.5
#> 3 lab1_molecules_score 11.5
#> 4 lab1_telomerase_score 11.5
#> 5 lab2_fluorescence_score 11.5
#> 6 lab3_muc3ac_score 11.5
#> 7 lab3_muc5ac_score 11.5
#> 8 lab4_areg_score 11.5
#> 9 lab4_glucose_score 11.5
#> 10 lab5_mucinous_call 11.5
#> # ℹ 12 more rows