Assessing Variable Importance Nonparametrically using Machine Learning Techniques
Published:
This talk (on a preliminary version of my R-squared variable importance paper published in Biometrics) was selected as the Most Outstanding Oral Paper.
Published:
This talk (on a preliminary version of my R-squared variable importance paper published in Biometrics) was selected as the Most Outstanding Oral Paper.
Published:
Contributed talk at the Thirty-fifth International Conference on Machine Learning.
Published:
This talk (on a preliminary version of my R-squared variable importance paper published in Biometrics) was selected for an ASA Biometrics Section Travel Award.
Published:
Invited talk on my dissertation research given at the UW Biostatistics Colloquium.
Published:
This talk (on a preliminary version of my general variable importance paper published in Journal of the American Statistical Association) was selected for an ASA Nonparametrics Section Travel Award.
Published:
Contributed talk at the 27th International Dynamics and Evolution of HIV and Other Human Viruses Meeting on SLAPNAP (see publications).
Published:
We discuss our paper to be published in the Proceedings of the Thirty-seventh International Conference on Machine Learning.
Published:
I discuss a framework for inference on general model-agnostic variable importance measures.
Published:
Guest lecture on some of the ways that statistics is used in infectious disease research in the Computational Biology course at Roanoke Valley Governor’s School.
Published:
I discuss three approaches towards a more principled use of machine learning: inference on the goodness of fit, inference on variable importance, and containerization.
Published:
I discuss a framework for inference on general model-agnostic variable importance measures.
Published:
Keynote talk at the 3rd annual Hutch United Symposium.
Published:
I gave a talk on work in progress, developing methods for variable selection in settings with missing data that do not rely on (generalized) linear models.
Published:
I discuss a framework for inference on general model-agnostic variable importance measures.
Published:
I discuss a framework for inference on general model-agnostic variable importance measures, and how this framework can be used to perform variable selection. I also briefly discuss several directions of current work, including longitudinal variable importance; a measure of how important variables are for tailoring treatment; and fairness-aware variable importance.
Published:
I discuss a method for model-agnostic variable selection that is robust to model misspecification and valid in settings with missing data.
Published:
I discuss a framework for inference on general model-agnostic variable importance measures and possible summary measures for longitudinal variable importance.
Published:
I discuss a framework for predicting safety outcomes of interest within the FDA’s Sentinel system using data from electronic health records.
Published:
I discuss a framework for inference on general model-agnostic variable importance measures and possible summary measures for longitudinal variable importance.
Published:
Lecture on some of the ways that statistics is used in infectious disease research to students in UW Biostatistics 111.
Published:
I discuss inference for summaries of longitudinal model-agnostic variable importance.
Published:
I discuss inference for summaries of longitudinal model-agnostic variable importance.
Published:
In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response – in other words, to gauge the variable importance of features. In this talk, I will discuss a model-agnostic notion of variable importance and general conditions under which valid inference on the true importance can be obtained, even when machine learning-based techniques are used as part of estimation. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. I provide several examples of predictiveness measures, including for right-censored outcomes, and illustrate the use of the proposed methods with data from a study of an antibody against HIV-1 infection.
Published:
Lecture on some of the ways that statistics is used in infectious disease research to students in UW Biostatistics 111.
Published:
In research assessing the effect of an intervention or exposure, a key secondary objective often involves assessing differential effects of this intervention or exposure in subgroups of interest; this is often referred to as assessing effect modification or heterogeneity of treatment effects (HTE). Observed HTE can have important implications for policy, including intervention strategies (e.g., will some patients benefit more from intervention than others?) and prioritizing resources (e.g., to reduce observed health disparities). Analysis of HTE is well understood in studies where the independent unit is an individual. In contrast, in studies where the independent unit is a cluster (e.g., a hospital or school) and a cluster-level outcome is used in the analysis, it is less well understood how to proceed if the HTE analysis of interest involves an individual-level characteristic (e.g., self-reported race) that must be aggregated at the cluster level. Through simulations, we show that only individual-level models have power to detect HTE by individual-level variables; if outcomes must be defined at the cluster level, then there is often low power to detect HTE by the corresponding aggregated variables. We illustrate the challenges inherent to this type of analysis in a study assessing the effect of an intervention on increasing COVID-19 booster vaccination rates at long-term care centers.
Published:
I discuss inference for summaries of longitudinal model-agnostic variable importance.
Published:
In research assessing the effect of an intervention or exposure, a key secondary objective often involves assessing differential effects of this intervention or exposure in subgroups of interest; this is often referred to as assessing effect modification or heterogeneity of treatment effects (HTE). Observed HTE can have important implications for policy, including intervention strategies (e.g., will some patients benefit more from intervention than others?) and prioritizing resources (e.g., to reduce observed health disparities). Analysis of HTE is well understood in studies where the independent unit is an individual. In contrast, in studies where the independent unit is a cluster (e.g., a hospital or school) and a cluster-level outcome is used in the analysis, it is less well understood how to proceed if the HTE analysis of interest involves an individual-level characteristic (e.g., self-reported race) that must be aggregated at the cluster level. Through simulations, we show that only individual-level models have power to detect HTE by individual-level variables; if outcomes must be defined at the cluster level, then there is often low power to detect HTE by the corresponding aggregated variables. We illustrate the challenges inherent to this type of analysis in a study assessing the effect of an intervention on increasing COVID-19 booster vaccination rates at long-term care centers.
I also gave an invited poster presentation on based this talk at the JSM opening mixer on August 4.
Published:
I discuss inference for summaries of longitudinal model-agnostic variable importance.
Published:
I discuss an extension of generalized raking to estimate marginal parameters (e.g., the average treatment effect) and obtain valid inference using machine learning.