BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection

Background Knowledge and Insights

  • Three core design goals
    • Able to detect individual bot infections
    • Only rely on high-level/network-flow information
      • Not inspect payloads
      • Resilient to the encrypted bot communication
      • Easy to obtain
    • Work for stealthy bots
      • Steal sensitive data
      • Not rely on noisy activity
        • Spamming
        • DoS
  • Observation
    • C&C connections follow regular patterns
    • Bots send similar traffic to C&C
    • Upload information to C&C in similar way
    • Timing patterns of communications with C&C
  • Run bot binaries in a controlled environment, learn patterns

Goals and Contributions

  • Observe that C&C traffic of different bot families exhibits regularities
  • A learning-based approach that automatically generates bot detection models
  • Prototype of BotFinder
    • A system that detects individual, bot-infected machines by monitoring their network traffic

DATA

  • Pick active malware samples in Anubis within a window of 30 days in June 2011
  • Average 32 samples, for each bot family

BotFinder

General architecture of BotFinder

Notation

  • τ\tau, trace, a sequence of chronologically-ordered flows between two network endpoints
  • MM, model
  • τM\tau_M, score of trace τ\tau on model MM
  • qclusterq_{cluster}, quality rate

Input Data Processing

  • All network traffic
  • Scanned by
    • VirusTotal
    • Anubis
  • Tolerate noise

Flow Reassembly

  • Aggregate data according to NetFlow

Trace Extraction

Traces with different statistical behaviors

  • τmin|\tau|_{min}, minimal number of connections (empirical value 10 to 50)
  • Command-and-control traffic consists of multiple connections between the infected host and the C&C server
  • Two ways to filter the traffic and identify the relevant traffic traces
    • Whitelist
    • Third-party knowledge

Statistical Features Analysis

  • Average time between the start times of two subsequent flows in the trace
    • Botmaster has to ensure that all bots under his control
    • Communication not following a push model
      • Hosts are behind NAT boxes
      • Not registered with C&C
    • Connect under a constant time interval, and exhibit loosely periodicity
  • Average duration of a connection
    • Expect the durations are similar
  • Number of bytes on average transferred to the source
  • Number of bytes on average transferred to the destination
  • Fast Fourier Transformation(FFT) over the flow start times
    • Identify underlying frequencies of communication

Model Creation (Training)

  • Cluster each feature separately
    • Malware features uncorrelated
  • Drop small clusters with more diverse data(lower clustering quality)
  • CLUES algorithm
    • Allow non-parametric clustering
    • Better than k-means
  • Cluster quality
    • Large clusters, with highly similar values, is better
    • qcluster=exp(βSDAvg)q_{cluster} = exp(-\beta \frac{SD}{Avg})
      • β=2.5\beta = 2.5 (empirical value)

Malware Detection

  • Match each feature of the trace against the corresponding model's cluster
  • τ\tau hit one feature of MM
  • Add qclusterexp(βSDtraceAvgtrace)q_{cluster} \cdot exp(-\beta \frac{SD_{trace}}{Avg_{trace}}) to τM\tau_M
    • β=2.5\beta = 2.5 (empirical value)
  • Maintain a τM\tau_M for each model
  • Compare highest τM\tau_M with threshold aa
  • Allow to specify a minimal number of feature hits, hh

Bot Evolution

Botnet's strategy

  • Adding Randomness
    • BotFinder's detection rate remains 60%, when 100% randomization
  • Introducing Larger Gaps
    • Fast Fourier Transformation
  • High Fluctuation of C&C Servers
    • BotFinder cannot build traces of minimal length τmin=50|\tau|_{min} = 50
      • Not observe such high C&C server fluctuations (IP flux)
    • Additional pre-processing step before Step 4
      • Merge two sub-traces τA\tau_A and τB\tau_B
      • Two factors
        • SDSD of τAB\tau_{AB} is lower than the SDSD of at least one of the individual traces
        • qclusterq_{cluster} of τAB\tau_{AB} higher than a threshold
  • P2P Bots
  • Bot-like Benign Traffic

Cost of botnet

  • C&C and bots need to update Botnet Communication Topologies
  • Increase the botnet operator's costs and reduce stability and performance of the malware network

Fail case

  • Significantly randomize the bot's communication pattern
  • Drastically increase the communication intervals to force BotFinder to capture traces over longer periods of time
  • Introduce overhead traffic for source and destination byte variation
  • Change the C&C server extremely frequently
  • Use completely different traffic patterns after each C&C server change

References

  • BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection
  • CS 259D Lecture 2

results matching ""

    No results matching ""