Inference for Model-Agnostic Variable Importance


In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response – in other words, to gauge the variable importance of features. In this talk, I will discuss a model-agnostic notion of variable importance and general conditions under which valid inference on the true importance can be obtained, even when machine learning-based techniques are used as part of estimation. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. I provide several examples of predictiveness measures, including for right-censored outcomes, and illustrate the use of the proposed methods with data from a study of an antibody against HIV-1 infection.