Interpreting Black-Box Predictive Models through Causal Attribution

Researcher(s)

Sponsoring Agency
National Science Foundation

Summary

Our ability to acquire and annotate increasingly large amounts of data together with rapid advances in machine learning have made predictive models trained using machine learning ubiquitous in virtually all areas of human endeavor. In high-stakes applications such as healthcare, finance, criminal justice, scientific discovery, education, and others, the resulting predictive models are complex, and in many cases, black-boxes. Consider for example, a medical decisionmaking scenario where a predictive model, e.g., a deep neural network, trained on a large database of labeled data, is to assist physicians in diagnosing patients. In this setting, it is important that the clinical decision support system be able to explain the output of the deep neural network to the physician, who may not have a deep understanding of machine learning. For example, the physician might want to understand the subset of patient characteristics that contribute to the diagnosis; or the reason as to why diagnoses were different for two different patients, etc. In high stakes applications of machine learning, the ability to explain the machine learned model is a prerequisite for establishing trust in the model’s predictions. Satisfactory explanations have to provide answers to questions such as: "What features of the input are responsible for the predictions?"; "Why are the model’s outputs different for two individuals?" (e.g., Why did John’s loan application get approved when Sarah’s was not?). Hence, satisfactory explanations have to be fundamentally causal in nature. This project will develop a theoretically sound, yet practical approach to causal attribution, that is, apportioning the responsibility for a black-box predictive model’s outputs among the model’s inputs.

The model interpretation question "Why did the predictive model generate the output Y for input X?" will be reduced to the following equivalent question: "How are the features of the model input X causally related to the model output Y?" In other words, the task of interpreting a black-box predictive model is reduced to the task of estimating, from observations of the inputs and the corresponding outputs of the model, the causal effect of each input variable or feature on the output variable. The planned methods do not require knowledge of the internal structure or parameters of the black-box model, or of the objective function or the algorithm used to train the model. Hence, the resulting methods can be applied, in principle, to any black-box predictive model, so long as it is possible probe the model and observe the model’s response to any supplied input data sample. Advances in causal attribution methods will help broaden the application of machine learned black-box predictive models in high-stakes applications across many areas of human endeavor. The project offers enhanced opportunities for research-based training of graduate and undergraduate students in Informatics, Data Sciences, and Artificial Intelligence. The investigator will develop a new course on Foundations and Applications of Causal Inference as well as modules on Causal Attribution that for possible inclusion in undergraduate and graduate courses in Machine Learning. The broad and free dissemination of open source library of causal attribution methods, course materials, data, research results will ease their adoption and use by AI researchers, educators, and practitioners.

Term
 -