BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection
Background Knowledge and Insights
- Three core design goals
- Able to detect individual bot infections
- Only rely on high-level/network-flow information
- Not inspect payloads
- Resilient to the encrypted bot communication
- Easy to obtain
- Work for stealthy bots
- Steal sensitive data
- Not rely on noisy activity
- Observation
- C&C connections follow regular patterns
- Bots send similar traffic to C&C
- Upload information to C&C in similar way
- Timing patterns of communications with C&C
- Run bot binaries in a controlled environment, learn patterns
Goals and Contributions
- Observe that C&C traffic of different bot families exhibits regularities
- A learning-based approach that automatically generates bot detection models
- Prototype of BotFinder
- A system that detects individual, bot-infected machines by monitoring their network traffic
DATA
- Pick active malware samples in Anubis within a window of 30 days in June 2011
- Average 32 samples, for each bot family
BotFinder
Notation
- τ, trace, a sequence of chronologically-ordered flows between two network endpoints
- M, model
- τM, score of trace τ on model M
- qcluster, quality rate
- All network traffic
- Scanned by
- Tolerate noise
Flow Reassembly
- Aggregate data according to NetFlow
- ∣τ∣min, minimal number of connections (empirical value 10 to 50)
- Command-and-control traffic consists of multiple connections between the infected host and the C&C server
- Two ways to filter the traffic and identify the relevant traffic traces
- Whitelist
- Third-party knowledge
Statistical Features Analysis
- Average time between the start times of two subsequent flows in the trace
- Botmaster has to ensure that all bots under his control
- Communication not following a push model
- Hosts are behind NAT boxes
- Not registered with C&C
- Connect under a constant time interval, and exhibit loosely periodicity
- Average duration of a connection
- Expect the durations are similar
- Number of bytes on average transferred to the source
- Number of bytes on average transferred to the destination
- Fast Fourier Transformation(FFT) over the flow start times
- Identify underlying frequencies of communication
Model Creation (Training)
- Cluster each feature separately
- Malware features uncorrelated
- Drop small clusters with more diverse data(lower clustering quality)
- CLUES algorithm
- Allow non-parametric clustering
- Better than k-means
- Cluster quality
- Large clusters, with highly similar values, is better
- qcluster=exp(−βAvgSD)
- β=2.5 (empirical value)
Malware Detection
- Match each feature of the trace against the corresponding model's cluster
- τ hit one feature of M
- Add qcluster⋅exp(−βAvgtraceSDtrace) to τM
- β=2.5 (empirical value)
- Maintain a τM for each model
- Compare highest τM with threshold a
- Allow to specify a minimal number of feature hits, h
Bot Evolution
Botnet's strategy
- Adding Randomness
- BotFinder's detection rate remains 60%, when 100% randomization
- Introducing Larger Gaps
- Fast Fourier Transformation
- High Fluctuation of C&C Servers
- BotFinder cannot build traces of minimal length ∣τ∣min=50
- Not observe such high C&C server fluctuations (IP flux)
- Additional pre-processing step before Step 4
- Merge two sub-traces τA and τB
- Two factors
- SD of τAB is lower than the SD of at least one of the individual traces
- qcluster of τAB higher than a threshold
- P2P Bots
- Bot-like Benign Traffic
Cost of botnet
- C&C and bots need to update Botnet Communication Topologies
- Increase the botnet operator's costs and reduce stability and performance of the malware network
Fail case
- Significantly randomize the bot's communication pattern
- Drastically increase the communication intervals to force BotFinder to capture traces over longer periods of time
- Introduce overhead traffic for source and destination byte variation
- Change the C&C server extremely frequently
- Use completely different traffic patterns after each C&C server change
References
- BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection
- CS 259D Lecture 2