A Community Research Infrastructure for Integrated AI-Enabled Malware and Network Data Analytics

Sponsoring Agency
National Science Foundation


The severity, frequency, scope, and sophistication of cybercrimes and cyberattacks have exploded in recent years, which resulted in huge financial damages to organizations, and threatened the most basic foundation of our society. Integrating the analysis of known malware with real-world scanning network traffic (e.g., those observed in a Network Telescope) has shown to be able to reveal the rapid and sophisticated evolving scanning behaviors of botnets. While a host of open source technologies for dynamic malware simulations have been developed and used by researchers; research opportunities that integrate results of these malware analysis with network flow data for AI-enabled pattern discovery and predictive modeling is not broadly accessible to cybersecurity community and AI community due to the difficulty to de-identify and link network data (containing sensitive information) from multiple sources, and the challenge of integrating and processing voluminous, high-dimensional network data.

To address these critical needs of the communities, we propose to design, with broad community inputs, a community research infrastructure that provides (1) open source malware dynamic analysis tools and services, (2) linkable de-identified data from multiple sources (network telescope, honeypot, and IDS), and (3) Jupyter notebooks and AI/ML modules, as well as reproducible/customizable pipelines for all phases of scalable AI-enabled malware pattern discovery and predictive modeling in a high-performance computing environment. The target research community for this research infrastructure will leverage existing multi-university single signon infrastructure being developed by the Eastern Regional Network (ERN). Multiple workshops will be organized for engaging with cybersecurity research community and artificial intelligence research community so that researchers using the infrastructure can share their user experience and/or use cases with their peers in their own community as well as in the neighboring communities on their research so that progress in this area can be accelerated through synergistic collaboration of researchers from malware analysis, network security, and artificial intelligence.

The proposed infrastructure will enable critically needed research for developing fine-grained fingerprints of malware so that their activities in the real network can be detected more accurately in near-real-time fashion. Furthermore, the automated generation of these fine-grained fingerprints using AI/ML methods will significantly reduce the time from first detection of malware to the deployment of effective intrusion prevention and detection capabilities to enterprise security operation through cybersecurity industry. Furthermore, the infrastructure will enable the creation of new type of research community that aims for foundation and methods for predictive solutions that address real world cybersecurity challenges using AI. It will also advance the foundation and methods of AI due to the uniquely complex, rapidly changing landscapes of cybersecurity challenges. Broader Impacts: The infrastructure enables research opportunities that can have a direct impact on improving the safety and security of our society’s cyber space. A broad research community will be engaged for identifying, selecting, and using appropriate tools for network and malware data analytics, as well as to contribute new tools, as needed. Self-contained hands-on tutorials and self-study materials as well as workshops and hackathons aim at audiences with diverse interests (AI/ML methods development, cybersecurity research, AI/MLenabled network and malware analytics) and backgrounds (AI/ML, cybersecurity). Hackathons engage a broad community of students to work on AIenabled cybersecurity challenge problems using network and malware data sets. Engagements with diverse student populations will be further facilitated by two hackathon activities to be held at Lincoln University and one online hackathon for girls. This focus to attract students and early career scholars from under-represented groups in STEM for outreach activities (hackathons and workshops) is likely to further enhance the diversity of next-generation cybersecurity and AI workforce.