class: center, middle, title-slide .title[ # Statistics in Infectious Disease Research ] .author[ ### Brian D. Williamson, PhD
Kaiser Permanente Washington Health Research Institute
] .date[ ###
https://bdwilliamson.github.io/#talks
] --- <style type="text/css"> .remark-slide-content { font-size: 22px } </style> ## My journey to full-time statistics research <img src="img/timeline_1.png" width="720px" style="display: block; margin: auto;" /> --- ## My journey to full-time statistics research <img src="img/timeline_2.png" width="720px" style="display: block; margin: auto;" /> --- ## My journey to full-time statistics research <img src="img/timeline_3.png" width="720px" style="display: block; margin: auto;" /> --- ## My journey to full-time statistics research <img src="img/timeline_4.png" width="720px" style="display: block; margin: auto;" /> --- ## My journey to full-time statistics research <img src="img/timeline_5.png" width="720px" style="display: block; margin: auto;" /> --- ## My journey to full-time statistics research <img src="img/timeline_6.png" width="720px" style="display: block; margin: auto;" /> --- ## My journey to full-time statistics research <img src="img/timeline_7.png" width="720px" style="display: block; margin: auto;" /> --- ## My journey to full-time statistics research <img src="img/timeline_8.png" width="720px" style="display: block; margin: auto;" /> --- ## What do I do all day? Biostatistics: turning data into knowledge (Patrick Heagerty, UW) Most weeks for me involve: -- * Coding or debugging in R and Python -- * Reading scientific papers -- * Doing math -- * Meetings with collaborators: -- * clinicians, epidemiologists -- * programmers, natural language processing experts -- * other statisticians -- * Mentoring students on research projects --- ## What do I do all day? Key skills: * Curiosity -- * Communication -- * Time management -- * Mathematical reasoning/logic --- ## Example: preparing for AMP .pull-left[ <img src="img/amp.png" width="270px" style="display: block; margin: auto;" /> NIH NIAID-sponsored HVTN + HPTN phase 2b HIV prevention efficacy trial; statistical design described in Gilbert et al. (2017) _Stat Commun Infect Dis_ <img src="img/neutralized_hiv.png" width="270px" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/pred_prob_resistance_slapnap.png" width="270px" style="display: block; margin: auto;" /> Magaret, Benkeser, Williamson et al. (2019) _PLoS Comp Bio_ ] --- ## Hypothesis testing Primary goal of AMP: .blue1[test] whether or not the therapy is .blue2[effective]. -- Secondary goal of AMP: .green[which genetic mutations] make HIV-1 susceptible to neutralization? -- <img src="img/hiv_sequence_movie_v2.png" width="720px" style="display: block; margin: auto;" /> -- Key challenges: -- * What does "susceptible" mean? How to measure it? -- * How do we determine if a mutation has a real effect? -- * How do we do this at .red[each position] on the HIV-1 genome? --- ## Hypothesis testing Example: testing whether a single coin is fair -- Suppose you have a single coin; probability of heads is `\(p_0\)` (unknown) -- Suppose you have results from 100 flips; proportion of heads `\(= 0.7\)` -- Your friend says that if you correctly guess the next flip, you get $100; otherwise, your friend gets $200 -- You want to _test_ if the coin is fair, i.e., test: `$$H_0: p_0 = 0.5 \text{ (coin is fair) vs } H_1: p_0 > 0.5 \text{ (coin is unfair)}$$` --- ## Hypothesis testing Make a decision based on the data -- Probability of a wrong decision? -- | | `\(H_0\)` true | `\(H_1\)` true | - | ---------- | ---------- | Decide `\(H_0\)` | 1 - `\(\alpha\)` | type II error | Decide `\(H_1\)` | .red[type I error] ( `\(\alpha\)` ) | .blue2[power] `\(\alpha = Pr(\text{reject } H_0 \mid H_0 \text{ true})\)` -- High power = good: want to win money -- Type I error = much worse: want to minimize chance of losing money -- .blue2[**Even more important**] to control type I errors in biomedical research -- Typically, we pre-specify a level `\(\alpha\)` of maximum allowable type I error --- ## Testing multiple hypotheses **Goal:** test whether 10 different coins are fair **Procedure:** perform level 0.05 test for each coin -- What is my overall chance of making a type I error? -- .red[Not] `\(\alpha\)`! Why? --- ## Testing multiple hypotheses Intuition: chances of observing a rare event (type I error) increase with increased number of observations -- ‍Mathematically: `\(Pr(\text{reject } H_{0,1} \mid H_{0,1} \text{ true}) = \alpha\)`; `\(Pr(\text{don't reject } H_{0,1} \mid H_{0,1} \text{ true}) = 1 - \alpha\)`; -- `$$Pr(\text{don't reject } H_{0,2} \mid H_{0,2} \text{ true}) = 1 - \alpha;$$` -- `$$\vdots$$` -- `$$Pr(\text{don't reject any } \{H_{0,j}\}_{j = 1}^{10} \mid \{H_{0,j}\}_{j = 1}^{10} \text{ true}) = (1 - \alpha)^{10}$$` -- `$$Pr(\text{reject one or more} \mid \{H_{0,j}\}_{j = 1}^{10} \text{ true}) = 1 -$$` `$$Pr(\text{don't reject any } \{H_{0,j}\}_{j = 1}^{10} \mid \{H_{0,j}\}_{j = 1}^{10})$$` -- `\(= 1 - (1 - \alpha)^{10} \approx 40\)`%!! --- ## Adjusting for multiple comparisons One simple way to adjust: divide `\(\alpha\)` by the number of tests -- ‍Example: genome-wide association study Testing for effect at 1000 genes: `\(.05 / 1000 = .00005\)` -- .red[This reduces power]: need **strong** effect! -- Another option: pre-screen based on some criterion? --- ## Preparing for AMP Recall the AMP problem: .green[which genetic mutations] make HIV-1 susceptible to neutralization? -- **Problem:** thousands of potential mutations across HIV-1 genome * Testing for effect of all mutations: small individual `\(\alpha\)` * Low power! **Goal:** pre-screen to a more reasonable number of mutations -- **Procedure:** combine * .blue1[machine learning] (more to come) -- * publicly-available data -- * .blue2[statistical inference] on variable importance (my PhD dissertation) --- ## Example: COVID-19 risk prediction .pull-left[ <img src="img/coronavirus_seattletimes.png" width="480px" style="display: block; margin: auto;" /> Source: _Seattle Times_ ] .pull-right[ <img src="img/seattle_flu_study.png" width="360px" style="display: block; margin: auto;" /> <img src="img/scan.png" width="360px" style="display: block; margin: auto;" /> ] --- ## Predicting clinical outcomes **Goal:** predict death or intubation (being put on a ventilator) based on factors measured at time of admission to hospital -- **Challenges:** * some measurements collected multiple times * want to allow flexible relationship between features and response **Idea:** use machine learning! --- ## Machine learning? (aka "artificial intelligence") Often, purely for .blue1[prediction] -- ‍Examples: * Regression analyses (classical statistics) -- * Decision trees: <a href="https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)">Deep Blue</a>, triage charts -- * Neural networks: [AlphaGo](https://deepmind.com/research/case-studies/alphago-the-story-so-far), [AlphaFold](https://www.deepmind.com/blog/article/AlphaFold-Using-AI-for-scientific-discovery), [ChatGPT](https://openai.com/blog/chatgpt) --- ## Machine learning: bias vs variance Any estimation technique has .red[bias] and .blue1[variance]. These correspond to .red[how far our estimates are from the truth] and .blue1[how variable our estimates are]. -- ‍Variability: if I repeated an experiment many times and analyzed the data, I wouldn't expect the same answer each time. "Variance" quantifies this. -- Typically, inducing bias in estimation will result in decreased variance, and vice versa. --- ## Machine learning: bias vs variance ‍Example: linear regression * find line that best fits points * "best-fitting": <img src="img/lm_init.png" width="360px" style="display: block; margin: auto;" /> --- ## Machine learning: bias vs variance ‍Example: linear regression * find line that best fits points * "best-fitting": <img src="img/lm_plus_sse.png" width="360px" style="display: block; margin: auto;" /> --- ## Machine learning: bias vs variance ‍Example: linear regression * find line that best fits points * "best-fitting": <img src="img/lm_plus_line.png" width="360px" style="display: block; margin: auto;" /> --- ## Bias-variance tradeoff in action Consider two options for obtaining predictions: either fit each point exactly (_interpolating_) or fit linear regression. Interpolating has .red[zero bias] on the training data; linear regression has .red[some bias]. <img src="img/bias_variance_tradeoff_1.png" width="360px" style="display: block; margin: auto;" /> --- ## Bias-variance tradeoff in action Moving one point from `\((0.5, -0.58)\)` to `\((0.5, -0.08)\)` changes the interpolating fit drastically, but doesn't change the linear regression fit. Interpolating has .blue1[large variance] on the training data; linear regression has .red[small variance]. <img src="img/bias_variance_tradeoff_2.png" width="360px" style="display: block; margin: auto;" /> --- ## Machine learning: bias-variance tradeoff Typically, machine learning methods allow .red[increased bias] on training data for .blue1[decreased variance], and possibly better predictions on test data. This increased bias can be difficult to characterize, rendering **inference based on machine learning techniques difficult**. One way to mitigate bias is to .blue2[average] many "base learners" together: * e.g., [guessing the weight of a cow](https://www.npr.org/sections/money/2015/08/07/429720443/17-205-people-guessed-the-weight-of-a-cow-heres-how-they-did) * often called "ensembling" or "model stacking" --- ## Example: HIV-1 vaccine regimen down-selection bnAb: broadly neutralizing antibody -- AMP trials studied a single bnAb; future trials study bnAb .blue1[combinations]. -- Key challenges for studying combinations: * does a given combination neutralize HIV effectively? * how should we prioritize the combinations to study in trials? -- We've created a tool called SLAPNAP (**S**uper Le**A**rner **P**redictions using **NA**b **P**anels) to address these challenges. --- ## Key components of SLAPNAP: Containerization <img src="img/docker.png" width="360px" style="display: block; margin: auto;" /> * Lightweight * Fully reproducible --- ## Key components of SLAPNAP: data and predictions When the container is compiled, downloads a snapshot of publicly-available HIV-1 neutralization database. -- ‍Predictions: use .blue1[ensembling] to develop the best-possible outcome predictor -- Describe .blue2[variable importance] using methods from my PhD dissertation --- ## Current opportunities in statistics/comp. bio. "Big Data": large number of observations * can lead to computational challenges * does more = better? * Examples: Google data, electronic health records "Big Data": large number of features * difficult to assess relationship between features and response * multiple testing * number of observations smaller than number of features? * Examples: genomic data --- ## Current opportunities in statistics/comp. bio. Natural language processing: can we learn from text? * what text is relevant for your problem? * how to include in a statistical procedure? * what quality of information do you have? * Examples: electronic health records Push for more agnostic modeling * conventional practice: parametric models * inferences can be wrong if misspecified! * how to use machine learning to do inference?