Efficient Nonparametric Statistical Inference on Population Feature Importance using Shapley Values


We discuss our paper to be published in the Proceedings of the Thirty-seventh International Conference on Machine Learning.


The true population-level importance of a variable in a prediction task provides useful knowledge about the underlying data-generating mechanism and can help in deciding which measurements to collect in subsequent experiments. Valid statistical inference on this importance is a key component in understanding the population of interest. We present a computationally efficient procedure for estimating and obtaining valid statistical inference on the Shapley Population Variable Importance Measure (SPVIM).