Towards Facilitating Kernel Vulnerability Reproduction by Fusing Crowd and Machine Generated Data

Xinyu Xing

Sponsoring Agency
National Science Foundation


The kernel is the core piece of software in a computer's operating system. Due to the high complexity of kernel software, finding all vulnerabilities during the development phase is nearly impossible. In recent years, crowdsourcing efforts have shown great success in discovering kernel vulnerabilities, where security professionals, hackers, and users can all contribute by submitting kernel bug reports. However, research shows that many vulnerability reports, including those generated by automated tools (e.g., kernel fuzzers), are not easily reproducible. Non-reproducible reports can cause significant delays to the patching process or lead kernel vendors to misjudge the severity of the vulnerability. Preliminary research shows vulnerability reports are not reproducible due to 1) missing information on the compilation configuration; (2) a lack of data to construct the contexts for triggering the bug; and (3) inaccurate or incomplete information about the vulnerable kernel versions. This project will develop new approaches combining crowd-reported and machine-generated data and static-dynamic program analysis to automate the process of inferring, constructing, and validating the needed information for kernel-vulnerability reproduction.

This project will provide much-needed automation for reproducing kernel bugs and vulnerabilities. If successful, the project will significantly advance computer security (for kernel vulnerability analysis) and contribute to the field of software engineering (for bug diagnosis and assessment). By improving the reproduction rate of kernel bugs, this project will also help with other parallel efforts for vulnerability patching and remediation. The expected advancements are three-fold. (1) The team will develop novel inference methods to infer the kernel compilation configuration based on memory snapshots and code segments in the bug reports. It will design new approaches to handle the untrusted or corrupted memory dumps caused by the bugs. (2) Team members will develop new mechanisms to construct precise triggering contexts to trigger the reported bugs (via kernel fault manipulation and injection). The context construction method is also able to pinpoint relevant faulty processes and handle kernel interrupt correctly. (3) New fuzzing tools will be designed to migrate input programs to enable much broader bug testing across kernel versions, and new methods to quickly determine non-vulnerable versions.

Research Area
Artificial Intelligence and Big Data
Privacy and Security