CAREER: Securing Deep Reinforcement Learning


Sponsoring Agency
National Science Foundation


Deep reinforcement learning combines deep neural networks with a reinforcement learning architecture that enables software-defined agents to learn the best actions possible in an environment to attain their goals. It has recently achieved impressive super-human performance in various games, e.g., racking up big wins in GO and defeating professional e-sports players in DOTA. Following the great strides in games, recent research expands deep reinforcement learning (reinforcement learning for short) to applications in other domains, such as enabling autonomous driving, optimizing chemical reactions, and even solving acute traffic congestion problems. Like many other deep learning techniques, deep reinforcement learning is vulnerable to adversarial attacks. In reinforcement learning, an adversarial attack manipulates a reinforcement learning agent's sensory observation, flummoxing it. Recently, research has demonstrated that an adversarial attack could be even more practical. Instead of implicitly assuming an attacker has the full control to influence an agent's sensory system, the new type of attack presents an adversarial agent to manipulate the target agent's environment and thus trigger it to react in an undesired fashion. Compared with the kind of attack that alters the sensory observation, the new attack is more difficult to counteract. First, the methods (e.g., adversarial training) commonly used for robustifying other deep learning techniques are no longer suitable for deep reinforcement learning. Second, given a reinforcement learning agent, there are few technical approaches to scrutinizing the agent and unveiling its flaws. This project intends to address these two significant problems by integrating and expanding upon a series of technical approaches used in explainable AI, adversarial training, and formal verification in conjunction with program synthesis. The basic idea is first to learn an adversarial agent informed by explainable AI. Using this learned agent, we then unveil the weakness of target agents and adversarially train them accordingly. Through a robustness check, we evaluate the enhanced agents. If a strengthened agent fails the adversary-resistance check, we fall back on formal verification and program synthesis techniques. I envision that this unified solution could identify the policy flaws of reinforcement learning agents and effectively remediate their weaknesses.