class: center, middle, title-slide # Inference for model-agnostic variable importance ### Brian D. Williamson, PhD
Fred Hutchinson Cancer Research Center
### 4 May, 2021
https://bdwilliamson.github.io/#talks
--- <style type="text/css"> .remark-slide-content { font-size: 20px; header-h2-font-size: 1.75rem; } </style> ## Research team This work was done in collaboration with: <img src="img/people1.PNG" width="65%" style="display: block; margin: auto;" /> <img src="img/people2.PNG" width="55%" style="display: block; margin: auto;" /> --- ## Motivation: HIV envelope and antibody targets <img src="img/hiv_env.png" width="90%" style="display: block; margin: auto;" /> .small[Source: Koff and Berkley (2010)] --- ## Motivation: AMP AMP overall objective: assess .blue2[VRC01] .blue1[prevention efficacy] (PE) against HIV-1 * ‍VRC01: broadly neutralizing antibody (bnAb) isolated from donor -- Key secondary question: .green[Which genetic mutations] make HIV-1 .purple[susceptible] to neutralization? -- ‍Challenges: * How should we measure susceptibility? -- * How do we determine if a mutation has a real effect? -- at .red[many] positions? -- * Can we use .mutedred[machine learning]? --- ## Variable importance: what and why **What is variable importance?** * .blue1[Quantification of "contributions" of a variable] (or a set of variables) -- Traditionally: contribution to .blue2[predictions] -- * Useful to distinguish between contributions of predictions... -- * (.blue1[extrinsic importance]) ... .blue1[by a given (possibly black-box) algorithm] .small[ [e.g., Breiman, (2001)] ] -- * (.blue1[intrinsic importance]) ... .blue1[by best possible (i.e., oracle) algorithm] .small[ [e.g., van der Laan (2006)] ] -- * Our work focuses on .blue1[interpretable, model-agnostic intrinsic importance] -- Example uses of .blue2[intrinsic] variable importance: * is it worth extracting text from notes in the EHR for the sake of predicting hospital readmission? -- * is it worth collecting a given covariate for the sake of predicting neutralization sensitivity? --- ## Case study: ANOVA importance Data unit `\((X, Y)\)` with: * outcome `\(Y\)` * covariate `\(X := (X_1, X_2, \ldots, X_p)\)` -- **Goals:** * .green[estimate] * .blue1[and do inference on] the importance of `\((X_j: j \in s)\)` in predicting `\(Y\)` -- How do we typically do this in **linear regression**? --- ## Case study: ANOVA importance How do we typically do this in **linear regression**? * Fit a linear regression of `\(Y\)` on `\(X\)` `\(\rightarrow \color{magenta}{\hat{\mu}(X)}\)` -- * Fit a linear regression of `\(Y\)` on `\(X_{-s}\)` `\(\rightarrow \color{magenta}{\hat{\mu}_s(X)}\)` -- * .green[Compare the fitted values] `\([\hat{\mu}(X_i), \hat{\mu}_s(X_i)]\)` -- Many ways to compare fitted values, including: * ANOVA decomposition * Difference in `\(R^2\)` -- Can extend this to a .blue1[nonparametric] measure using .blue2[population quantities] -- Can do .blue1[inference] on this parameter, even if we used .green[machine learning] .small[[details in Williamson et al. (2020a)]] --- ## A unified framework Our proposed general framework for inference on **model-agnostic** variable importance: -- 1. Choose a .blue2[scientifically relevant] measure of .blue1[predictiveness] `\(V\)` -- * Large `\(V\)` = good predictions * e.g., classification accuracy, AUC, `\(R^2\)` -- 2. Define variable importance as `\(V(\)` .blue1[best possible] prediction function using .green[all variables] `\() -\)` `\(V(\)` .blue1[best possible] prediction function .red[excluding the variables of interest] `\()\)` -- .blue1[Inference] possible even with machine learning! .small[ [Details in Williamson et al. (2020b)] ] Implemented in `R` package `vimp` (available on `CRAN`!) --- ## Preparing for AMP <img src="img/vim_ic50.censored_pres_r2_acc_auc_conditional_simple.png" width="60%" style="display: block; margin: auto;" /> ‍Implications: * All sites in the VRC01 binding footprint, CD4 binding sites appear important * Results may differ based on chosen measure .small[ _Data sourced from the CATNAP database (Yoon et al., 2015)_ ] --- ## Extensions and future work Handling potentially correlated features: <img src="img/spvim.png" width="50%" style="display: block; margin: auto;" /> Assessing neutralization potential of combination bnAb regimens: <img src="img/slapnap.png" width="50%" style="display: block; margin: auto;" /> Variable selection using population importance: my postdoctoral focus --- ## Working towards equity, diversity, and inclusion My personal goal: work towards EDI in biomedical research and practice -- **Educational outreach:** [Hutch United Outreach Committee](https://www.fredhutch.org/en/research/education-training/hutch-united.html) **Mentoring:** [Cientifico Latino](https://www.cientificolatino.com/gsmi), [Pomona College alumni mentor](https://www.sagepost47.com/) **Research:** * working with historically underserved communities * working to reduce health disparities * increasing access to high-quality preventive care --- ## HU + SCC MESA Partnership <img src="img/hu-mesa.png" width="100%" style="display: block; margin: auto;" /> .small[ _Images courtesy of Stephanie Shadbolt_ ] --- ## 2021 FH / SCC virtual internships <img src="img/people3.png" width="85%" style="display: block; margin: auto;" /> -- <img src="img/internship_timeline1.PNG" width="70%" style="display: block; margin: auto;" /> --- ## 2021 FH / SCC virtual internships <img src="img/people3.png" width="85%" style="display: block; margin: auto;" /> <img src="img/internship_timeline2.PNG" width="70%" style="display: block; margin: auto;" /> --- ## 2021 FH / SCC virtual internships <img src="img/people3.png" width="85%" style="display: block; margin: auto;" /> <img src="img/internship_timeline3.PNG" width="70%" style="display: block; margin: auto;" /> --- ## 2021 FH / SCC virtual internships <img src="img/people3.png" width="85%" style="display: block; margin: auto;" /> <img src="img/internship_timeline4.PNG" width="70%" style="display: block; margin: auto;" /> --- ## 2021 FH / SCC virtual internships <img src="img/people3.png" width="85%" style="display: block; margin: auto;" /> <img src="img/internship_timeline5.PNG" width="70%" style="display: block; margin: auto;" /> --- ## 2021 FH / SCC virtual internships <img src="img/people3.png" width="85%" style="display: block; margin: auto;" /> <img src="img/internship_timeline6.PNG" width="70%" style="display: block; margin: auto;" /> --- ## Internship projects at Fred Hutch <img src="img/people4.PNG" width="55%" style="display: block; margin: auto;" /> ‍**Lucy:** developing .blue1[clinical trial protocols] for National Cancer Institute-funded clinical trials in .green[rare neuroendocrine tumors]. ‍**Drew:** developing .blue1[guidelines] for using .blue2[machine learning] to do .blue2[variable screening] in risk-prediction analyses (e.g., cancer risk, COVID-19 risk, HIV-1 risk). --- ## Closing thoughts .blue1[Inference] on .blue2[model-agnostic] variable importance, using .green[machine learning] .blue1[EDI work] can (and should) be .blue2[part of research] A .green[strong community partner] (e.g., SCC) is crucial for this work Many aspects needed for successful outreach: * positions (requires scientific staff as sponsors) * administrative support (many thanks to **Jill Anderson**, **Tess Hurley**, and **Scientific Computing**) * long-term commitment (many thanks to **Stephanie Shadbolt** and **Marilyn Saavedra-Leyva**) .center[
https://github.com/bdwilliamson |
https://bdwilliamson.github.io ] --- ## References * .small[ Benkeser, DC, Williamson BD, Magaret CA, Nizam S, and Gilbert PB. 2020. Super LeArner Prediction of NAb Panels (SLAPNAP): A Containerized Tool for Predicting Combination Monoclonal Broadly Neutralizing Antibody Sensitivity. _BioRxiv technical report_.] * .small[ Breiman, L. 2001. Random forests. _Machine Learning_.] * .small[ Koff, WC and Berkley, SF. 2010. The Renaissance in HIV vaccine development -- Future Directions. _The New England Journal of Medicine_ ] * .small[ van der Laan, MJ. 2006. Statistical inference for variable importance. _The International Journal of Biostatistics_.] --- ## References * .small[ Williamson B, Gilbert P, Carone M, and Simon N. 2020a. Nonparametric variable importance assessment using machine learning techniques (+ rejoinder to discussion). _Biometrics_. ] * .small[ Williamson B, Gilbert P, Simon N, and Carone M. 2020b. A unified approach for inference on algorithm-agnostic variable importance. _ArXiv technical report_. ] * .small[ Williamson B and Feng J. 2020. Efficient nonparametric statistical inference on population feature importance using Shapley values. _ICML_. ] * .small[ Yoon H, Macke J, West AP, et al. 2015. CATNAP: a tool to compile, analyze and tally neutralizing antibody panels. _Nucleic Acids Research_. ]