Multi-Perspective Bayesian Learning for Automated Diagnosis of Advanced Malware

Team: Jian Zhang (LSU) , Phil Porras (SRI International) Vinod Yegneswaran (SRI International) ,

The project investigates a new probabilistic methodology for diagnosing the presence of infections that is inspired by the foundations of abductive-based disease diagnosis algorithms. We propose to develop methods for automatically deriving probabilistic malware infection models that capture the host forensic impacts of the latest spreading Internet malware. The models can be further extended into a probabilistic malware knowledge base that is flexible in identifying different malware variations, even variations that have not been seen before. The knowledge base provides analysis of contemporary malware, similar to what antivirus companies do, but in a fully automated fashion so that large quantities of malware and their variations can be dealt with. We further propose the development of a host-based malware diagnosis system called Host-Rx, which employs probabilistic Bayesian inference to prioritize symptoms and identify the most likely contagion among a suite of competing diagnosis models. If successful, this research will introduce a new complementary strategy for diagnosing malware infections in ways that are not defeatable through the current suite of antivirus countermeasures. Moreover, it will demonstrate how the use of probabilistic models can fully capture the complexities of malware forensic impacts, incorporating both independent and combined symptom probabilities.

This research seeks to introduce a fundamental shift from the current usage of malware honeynets, from passive analysis and measurement systems or cluster labeling systems, to active forensic-signature generation systems. An envisioned future network of Internet honeynet devices will construct and publish emerging probabilistic infection models, which are consumed by host agents that continually diagnose malware infections on their local machines. This project will show how malware infection diagnosis can be cast into the multiple diseases diagnosis paradigm, leveraging the work of abductive-based Bayesian inference networks to represent and later search for complex symptom combinations among a large body of potential disease profiles. Without the introduction of new research directions in areas such as probabilistic infection diagnosis, the future of malware defense may continually lag behind the lucrative advances being made in the malware development community.

Relevant Publications

Jian Zhang, Phil Porras and Vinod Yegneswaran Host-Rx: Automated Malware Diagnosis Based on Probabilistic Behavior Models SRI Technical Report, 2010 ( pdf )

Lakshman Nataraj, Vinod Yegneswaran, Phil Porras, Jian Zhang. A Comparative Assessment of Malware Classification using Binary Texture Analysis and Dynamic Analysis. Proceedings of ACM CCS Wokshop on Artificial Intelligence and Security (AISEC), October 2011. ( pdf )

Chao Yang, Vinod Yegneswaran, Phil Porras and Guofei Gu Detecting Money-Stealing Apps in Alternative Android Markets (Poster) Proceedings of CCS 2012. ( pdf )

Chao Yang, Zhaoyan Xu, Guofei Gu, Vinod Yegneswaran and Phil Porras DroidMiner: Automated Mining and Characterization of Fine-grained Malicious Behaviors in Android Applications Proceedings of the 19th European Symposium on Research in Computer Security (ESORICS'14), September 2014. ( pdf )

Acknowledgments
This project is funded by a grant from the National Science Foundation. Award Number IIS-0905518. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF.