Algebra for distributed collaborative semi-supervised classification of cyber activities

2018 
In the last several years, the volume and diversity of cyber attacks on the U.S. commercial and government networks have increased dramatically, including malware, web attacks (e.g., drive-by downloads), zero-day exploits, and men in the middle (e.g., session hijacking). While many tools are available to attackers, cyber criminals increasingly rely on straightforward intrusion approaches (e.g., spear-phishing), employ vast distributed resources (botnets), and hide attack vectors via stepping stone attacks. Detecting such activities and infrastructure represent the most difficult challenge to cyber-security professionals, because these threats are often locally invisible at the isolated subnetworks. Cyber threat detection tools employed in the field today fail to deal with data volume, speed, and diversity of the cyber-attacks. Intrusion Detection Systems (IDS) are ineffective against novel threats, while anomaly-based methods generate large number of false alarms and are difficult to interpret. Supervised algorithms require curated labeled datasets to train their models which do not exist for novel attacks. Yet, the biggest challenge of these systems is a requirement that all of the data be available at a single global repository. The cost of maintaining global repository and associated computation infrastructure becomes unsustainable as the volume of cyber data collection increases. As threat detection solutions are deployed predominantly to analyze local traffic collected within and on the border of a single organization, these tools are unable to detect attacks that are locally invisible, such as attacks cross-cutting organizational boundaries. In this paper, we describe a new computational framework which will enable distributed enterprises to (a) perform local inference computations; and (b) collaborate using global messages and hybrid strategies to detect a wide range of global threats that are not locally visible. First, we present a matrix-based algebra that generalizes a wide range of machine learning algorithms to maximize the breadth of attack phenomena to be detected. We then derive a semi-supervised attack detection model that uses a hybrid collaboration with adaptive local and global computations at distributed repositories to detect global events when it is not possible to move all relevant data into a centralized location. Finally, we propose a feedback model to create active human-in-loop system which integrates cyber analysts into malicious behavior detection and pattern learning process by generating requests for annotation and result examination using small number of representative instances of anomaly and threat detection outcomes
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    4
    References
    0
    Citations
    NaN
    KQI
    []