Towards Robust, Explainable and Fair Machine Learning Models with Causal Learning


Sponsoring Agency
Cisco Systems, Inc.


Machine learning models, especially deep learning based, are black-box models that lack interpretability, are vulnerable to adversarial attacks, and could easily inherit the bias from the training data to give unfair prediction, which hinders their adoption in high-stake scenarios. Hence, it is critical to ensure that machine learning models are behaving responsibly and are trustworthy. Despite various efforts taken, most existing works study robustness, explainability and fairness independently, which still be fully trusted. For example, recent works show that an adversary can easily fool a fairness-aware model to make unfair decision and can fool an explainable model to give wrong explanation. Therefore, in this project, it is of critical importance to develop a unified framework that is simultaneously accurate, robust, explainable and fair. However, this important direction is less studied. We observe that real-world data sample is usually composed of casual features that drive the label y and non-causal features that might have correlation with label. Such non-causal features could result in spurious explanation, are vulnerable to adversarial attack and might result in unfair predictions. Therefore, we propose to learn a unified framework that is accurate, robust, explainable and fair, via casual learning by capturing the causal features.