BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure-Independent Botnet Detection

Background Knowledge and Insights

  • The Botnet is like an army
  • Bots act a similar/correlated way
    • Non-human driven
    • Programmed to perform C&C logic/communications *Communication patterns not change, even varying C&C serve IPs or neighbor peers
    • Otherwise, degenerate into a group of unrelated/isolated infections
      • A botnet should be different from a set of isolated individual malware instances

Goals and Contributions

  • Detection is grounded on the definition and essential properties, no a priori knowledge
  • A new "aggregated communication flow" (C-flow)
  • BotMiner

BotMiner

BotMiner detection framework

  • Assume
    • Bots within the same botnet will be characterized by similar malicious activities and similar C&C communication patterns
  • Detection method
    • Cluster similar communication traffic
      • Who is talking to whom
      • C-plane (C&C communication traffic)
    • Cluster similar malicious traffic
      • Who is doing what
      • A-plane (Activity traffic)
    • Perform cross-cluster correlation
      • Find a coordinated group pattern
  • Priori knowledge
    • Botnet's protocol
    • Captured bot binaries (botnet signatures)
    • C&C server names/addresses
    • Content of the C&C communication
  • Objectives
    • Independent of the protocol and structure
    • Resistant to changing location
    • Independent of the content of communication
    • Low false positive
    • Low false negative
    • Efficient

C-plane Monitor

  • Who is talking to whom
  • Limit to TCP and UDP
  • Each flow record contains
    • Time
    • Duration
    • Source IP
    • Source port
    • Destination IP
    • Destination port
    • # packets, both directions
    • Bytes, both directions

A-plane Monitor

  • Who is doing what
  • Analyses outbound traffic
  • Based on Snort, with some modifications
  • Detects several types of malicious activities
    • Scanning
      • SCADE (Statistical Scan Anomaly Detection Engine)
      • Two anomaly detection modules (OR)
        • Abnormally-high scan rate
        • Weighted failed connection rate
    • Spamming
      • Developed a new Snort plug-in
    • PE (Portable Executable) binary downloading
      • PEHunter
      • BotHunter
        • Egg download detection method
          • Egg:full malicious binary program
    • Exploit
    • Easily add others
  • A-plane monitoring alone is not sufficient for botnet detection purpose
    • A-plane activities are not exclusively used in botnets
    • Loose design will generate a lot of false positives

C-plane Clustering

C-plane Clustering

  • Basic-filtering (F1, F2)
    • Irrelevant traffic flows (F1)
      • Internal hosts
      • From external hosts to internal hosts
    • Not completely established (F2)
  • White-listing (F3)
    • Flows with well-known destinations
  • Given an epoch EE, aggregate into communication flows (C-flows)
    • ci={fj}j=1..mc_i = \{f_j\}_{j=1..m}, C-flow
      • Where fjf_j have same protocol (TCP/UDP), source IP, destination IP & port
      • mm, #\# TCP/UDP flows
  • Feature extraction
    • For each C-flow
      • Temporal
        • #\# flows per hour (fphfph)
        • #\# bytes per second (bpsbps)
      • Spatial
        • #\# packets per flow (ppfppf)
        • #\# bytes per packets (bppbpp)
    • 1313 bins per feature
    • Dimension of features, d=4×13=52d = 4 \times 13 = 52
  • Two-step Clustering
    • Clustering C-flows is a challenging task
      • Large networks
      • Large feature space
      • Small percentage of bots
    • X-means, a variant of k-means
      • Not require the user to choose the number of clusters
    • Feature reduction
      • d=52d = 52 to d=8d' = 8
      • {Avg,SD}×{fph,ppf,bpp,bps}\{Avg, SD\} \times \{fph, ppf, bpp, bps\}
    • First-step, Coarse grained clustering on entire dataset, d=8d' = 8
    • Second-step, Fine-grained clustering on multiple smaller clusters using all features, d=52d = 52
  • FP and FN can be reduced by A-plane

A-plane Clustering

A-plane clustering

  • Two-layer
    • Cluster according to the activity types
      • Scan activity features
        • Scanning ports
        • Target subnet
      • Spam activity features
        • Highly overlapped SMTP connection destinations
      • Binary download
        • Capture and compare the first portion (packet) of the binary
    • Cluster according to the activity features

Cross-plane Correlation

  • Calculate botnet score, s(h)s(h), for every host hh
  • Similarity score between host hih_i and hjh_j
    • Indication function (boolean value)
  • Hierarchical clustering

Limitations and Potential Solutions

  • Evading C-plane monitoring and clustering
    • Utilize a legitimate website
      • Maybe not able to hide this secondary URL and the corresponding communications
    • Manipulate their communication patterns
      • Still could be clustered together just like P2P
    • Randomize individual communication patterns
      • Measure the distribution and entropy of communication features
      • Normal user communications may not have such randomized patterns
    • Mimic the communication patterns of normal hosts
    • Covert channels
    • Communication randomization, mimicry attacks and covert channel represent limitations for all traffic-based detection approaches
  • Evading A-plane monitoring/clustering
    • Stealthy malicious activities
      • Scan slowly
      • Spam slowly
    • Botmaster commands each bot randomly and individually to perform different task
      • Not likely a botnet
    • Differentiate the bots and avoid commanding bots in the same monitored network the same way
      • Distributed monitors, larger monitored space
    • Use complementary systems like BotHunter
  • Evading cross-plane analysis
    • Extremely delayed task
      • Use multiple-day data and cross check back several days
      • Impedes attack efficiency
      • The bot may be offline or powered off

References

results matching ""

    No results matching ""