Statistical Inference and Containerization in Computational Pipelines
Date:
I discuss three approaches towards a more principled use of machine learning: inference on the goodness of fit, inference on variable importance, and containerization.
Machine learning, while a valuable tool in many contexts, is often used without reliable checks to ensure that the resulting predictions are useful and reproducible. In this talk, I will discuss three approaches towards a more principled use of machine learning: inference on the goodness of fit, inference on variable importance, and containerization. Inference on meaningful target parameters in the context of machine learning can be a helpful part of the data analysis pipeline, ensuring that the predictions made are achieving some level of good performance. The use of containers allows for explicit standardization of an entire computing environment, from the operating system down to specific software packages. As case study, we will consider the task of selecting broadly neutralizing antibody (bnAb) regimens for future efficacy trials using machine learning, and a computational platform for doing so that incorporates the above methods. This is joint work with Craig Magaret, Sohail Nizam, Peter Gilbert, and David Benkeser.