class: center, middle, title-slide .title[ # Nonparametric generalized raking for incomplete data ] .author[ ### Brian D. Williamson, PhD
Kaiser Permanente Washington Health Research Institute
] .date[ ### 17 December, 2024
https://bdwilliamson.github.io/#talks
] --- <style type="text/css"> .remark-slide-content { font-size: 20px header-h2-font-size: 1.75rem; } </style> ## Acknowledgments This work was done in collaboration with: <img src="img/people.png" width="100%" style="display: block; margin: auto;" /> --- ## Motivation We are often missing data on some people: * traditional missing data * two-phase studies (missing data by design) * Electronic health records Traditional methods require correct outcome model and either * correct missing-data model (inverse probability weighting) * correct imputation model (multiple imputation) Here, we consider estimators that are consistent under weaker conditions --- ## Data structure and notation Outcome: `\(Y\)` (binary) Exposure or treatment: `\(X\)` -- Covariates measured on full cohort: `\(Z\)` Covariates subject to missingness: `\(W\)` Indicator of complete data: `\(R\)` -- Ideal data `\((Y,X,Z,W) \sim \mathbb{P}_0\)` Observed data `\((Y, X, Z, R, RW) \sim P_0\)` --- ## Estimating a regression parameter with missing data Suppose our interest is in estimating `\(\beta_1\)` from model `$$E(Y \mid X = x, Z = z, W = w) = \beta_0 + \beta_1 x + [z, w] \alpha$$` -- Classical estimator: inverse probability of coarsening-weighted (IPW) estimator -- Probability of observing `\(W\)`: `\(\pi_0(y,x,z) = E_0(R \mid Y = y, X = x, Z = z)\)` Suppose `\(U(\beta)\)` is the ideal-data estimating equation IPW estimator: 1. Obtain estimator `\(\pi_{n,IPW}(y,x,z)\)` of `\(\pi_0\)` (using, e.g., logistic regression) 2. Solve IPW estimating equation: `$$0 = \frac{1}{n}\sum_{i=1}^n \frac{R_i}{\pi_{n,IPW}(Y_i, X_i, Z_i)}U(\hat{\beta}_{1,IPW})$$` (using, e.g., `glm` in `R`) --- ## Generalized raking If `\(\pi_{n,IPW}\)` is consistent, then `\(\hat{\beta}_{1,IPW}\)` is consistent What if `\(\pi_{n,IPW}\)` is not consistent? -- Idea: adjust the weights to increase robustness For a distance measure `\(d\)`, `\begin{align*} \pi_{n,GR,i} =& \ a_i \pi_{n,IPW,i} \\ a =& \ \min_{b \in \mathbb{R}^n} \sum_{i=1}^n R_i d(b_i \pi_{n,IPW,i}^{-1}, \pi_{n,IPW,i}^{-1}) \\ & \ \text{subject to} \sum_{i = 1}^n h_i = \sum_{i = 1}^n R_i b_i h_i \pi_{n,IPW,i}^{-1}, \end{align*}` where `\(h\)` is a vector of the estimated efficient influence functions for the regression parameters Then, solve estimating equation: `$$0 = \frac{1}{n}\sum_{i=1}^n \frac{R_i}{\pi_{n,GR}(Y_i, X_i, Z_i)}U(\hat{\beta}_{1,GR})$$` --- ## Generalized raking Shown to be equivalent to the optimal augmented IPW (AIPW) estimator [Lumley et al. (2011), Robins et al. (1994)] Implemented in software: `R` `survey` package [Lumley (2004)] The generalized raking estimator is _asymptotically linear_: we can write `\begin{align*} \sqrt{n}(\hat{\beta}_{1,GR} - \beta_1) = \frac{1}{\sqrt{n}}\sum_{i=1}^n \phi_\beta(Y_i, X_i, Z_i, W_i) + o_P(1) \end{align*}` Inference for `\(\beta_1\)` can be carried out using the influence function `\(\phi_\beta\)` * for glms, `\(\phi_\beta = I(\beta)^{-1}\dot{\ell}_\beta(y, x,z,w)\)` * `\(I\)` is the Fisher information * `\(\dot{\ell}\)` is the score --- ## Marginal estimands Now, interest is in a .blue1[marginal] estimand Examples: * Treatment-specific mean: `\(\mu_x = E\{E(Y \mid X = x, Z, W)\}\)` * Average treatment effect (ATE): `\(ATE = \mu_1 - \mu_0\)` * With binary outcomes: * risk difference (RD) = ATE * relative risk (RR): `\(RR = \frac{\mu_1}{\mu_0}\)` Often of interest in causal inference, safety and efficacy studies --- ## Using traditional generalized raking How do we estimate the ATE using traditional raking? 1. Estimate `\(\pi_{n,IPW}\)` using, e.g., logistic regression 2. Calibrate weights*, obtain `\(\pi_{n,GR}\)` 3. Fit weighted outcome regression model, obtain `\(Q_n\)` (using, e.g., linear regression) 4. Estimate treatment-specific means `$$\mu_{x,n} = \sum_{i=1}^n \frac{R_i}{\pi_{n,GR,i}}Q_n(x, Z_i, W_i)$$` 5. Plug in treatment-specific means to estimate ATE: `\(ATE_n = \mu_{1,n} - \mu_{0,n}\)` 6. Inference can be carried out using the delta method * using influence functions for regression parameters --- ## Marginal generalized raking Traditional raking is doubly robust: consistent if either * estimator `\(Q_n\)` of `\(Q(x,z,w) = E(Y \mid X = x, Z = z, W = w)\)` is consistent, or * estimator `\(\pi_n\)` of `\(\pi = E(R \mid X = x, Z = z, Y = y)\)` is consistent The theory for traditional raking relies on .blue1[parametric models] * `\(Q_n\)` and `\(\pi_n\)` could be inconsistent in complex models -- idea 1: use machine learning to gain robustness idea 2: target marginal estimand to gain efficiency over delta method --- ## Using marginal generalized raking How do we estimate the ATE using marginal raking? 1. Estimate `\(\pi_{n,IPW}\)` using, e.g., logistic regression or machine learning 2. Calibrate weights*, obtain `\(\pi_{n,GR}\)` 3. Fit weighted outcome regression model, obtain `\(Q_n\)` (using, e.g., machine learning) 4. Estimate treatment-specific means `$$\mu_{x,n} = \sum_{i=1}^n \frac{R_i}{\pi_{n,GR,i}}Q_n(x, Z_i, W_i)$$` 5. Plug in treatment-specific means to estimate ATE: `\(ATE_n = \mu_{1,n} - \mu_{0,n}\)` 6. Inference can be carried out using influence functions * using influence function for estimand of interest Influence functions for common estimands (e.g., ATE, RR) well-studied --- ## Simulation Estimators to compare: 1. traditional raking 2. marginal raking using semiparametric influence function 3. marginal raking using nonparametric influence function Data: * four continuous covariates `\((Z_1, Z_2)\)`, `\((W_1, W_2)\)` * missing data probability 40% Scenarios: 1. continuous outcome 2. binary outcome with `\(P(Y = 1) = 0.25\)` Simple outcome model and missing-data model --- ## Results - continuous outcome <img src="img/sim_combined_performance_plot_s9.png" width="100%" style="display: block; margin: auto;" /> --- ## Results - binary outcome <img src="img/sim_combined_performance_plot_s1.png" width="100%" style="display: block; margin: auto;" /> --- ## Closing thoughts In simple scenarios, all estimators perform similarly In more complex scenarios, nonparametric marginal raking expected to be more robust than traditional raking Preprint to come on arXiv <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"></path></svg> https://github.com/bdwilliamson | <svg viewBox="0 0 496 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M336.5 160C322 70.7 287.8 8 248 8s-74 62.7-88.5 152h177zM152 256c0 22.2 1.2 43.5 3.3 64h185.3c2.1-20.5 3.3-41.8 3.3-64s-1.2-43.5-3.3-64H155.3c-2.1 20.5-3.3 41.8-3.3 64zm324.7-96c-28.6-67.9-86.5-120.4-158-141.6 24.4 33.8 41.2 84.7 50 141.6h108zM177.2 18.4C105.8 39.6 47.8 92.1 19.3 160h108c8.7-56.9 25.5-107.8 49.9-141.6zM487.4 192H372.7c2.1 21 3.3 42.5 3.3 64s-1.2 43-3.3 64h114.6c5.5-20.5 8.6-41.8 8.6-64s-3.1-43.5-8.5-64zM120 256c0-21.5 1.2-43 3.3-64H8.6C3.2 212.5 0 233.8 0 256s3.2 43.5 8.6 64h114.6c-2-21-3.2-42.5-3.2-64zm39.5 96c14.5 89.3 48.7 152 88.5 152s74-62.7 88.5-152h-177zm159.3 141.6c71.4-21.2 129.4-73.7 158-141.6h-108c-8.8 56.9-25.6 107.8-50 141.6zM19.3 352c28.6 67.9 86.5 120.4 158 141.6-24.4-33.8-41.2-84.7-50-141.6h-108z"></path></svg> https://bdwilliamson.github.io --- ## References * .small[Lumley T, Shaw PA, Dai JY (2011). Connections between survey calibration estimators and semiparametric models for incomplete data. _International Statistical Review_] * .small[Lumley T (2004). Analysis of complex survey samples. _Journal of Statistical Software_] * .small[Robins JM, Rotnitzky A, Zhao LP (1994). Estimation of regression coefficients when some regressors are not always observed. _Journal of the American Statistical Association_]