PAYL: Anomalous Payload-based Network Intrusion Detection
Background Knowledge and Goals
- Existing IDSs failed on
- 0-day
- Slow and stealthy worm propagation
- Detect first occurrences of zero-day worms or new malicious codes delivered via network
- Signatures not effective
- Slow/stealthy worm propagation can avoid bursts in network traffic flows or probes
- Requires payload based detection
Data
- 1999 DARPA IDS dataset
- CUCS dataset
- Data units
- Full packet
- First 100 bytes of packet
- Last 100 bytes of packet
- Full connection
- First 1000 bytes of connection
Feature
- Mean and variance of each frequency
PAYL
- Design criteria and operating objectives
- Hands-free
- Generality for any service or system
- Incremental update to accommodate changing or drifting environments
- Low FP
- High bandwidth environments, low latency, efficient real-time operation
Length Conditioned n-Gram Payload Model
- Cluster streams
- Port number
- Packet length range
- Proxy for type of payload
- Larger payloads contain media or binary data
- Direction of stream
- Measurement: n-gram frequencies
- Given packet length L, frequency =# of occurrences /(L−n+1)
- Use n=1: 256 ASCII characters
- i, observed length bin
- j, port number
- Mij, average byte frequency and the standard deviation of each byte's frequency
Simplified Mahalanobis Distance
- Simplifications
- Naïve assumption: Byte frequencies independent
- Covariance matrix becomes diagonal
- Replace 2-norm with 1-norm
- Avoid time-consuming square and square-root computations
- Add smoothing factor = 0.001
- Avoid zero SD and infinite distance
- Avoid same frequency
- Reflect statistical confidence of sampled training data
- larger smoothing factor, less confidence
- x, feature vector of the new observation
- y, averaged feature vector computed from the training dataset
- σ, standard deviation
- d(x,y)=∑i=0n−1∣xi−yi∣/(σ+α)
- Intuitive Explanation
- A kind of "normalization" observation examples by "length bin", and summing the results
- For each dimension (observed length bin, i)
- Measure the Manhattan distance between a sample data and the center point
- Normalize with smoothed SD
- Summation
Incremental Learning
- Adapt to concept Drift
- Use streaming measurements for mean and standard deviation
- x=x+(xN+1−x)/(N+1)
- Store the average of xi2, 256-element array
Reduced Model Size by Clustering
- Fine-grained model problem
- Large total size of model
- Similar distributions for near lengths
- Insufficient training data for some lengths
- Solution
- Merge neighboring models
- Borrow data from neighboring bins
- For lengths not observed in training data
- Use closest length range
- Alert on unusual length
Unsupervised learning
- Assumption: Attacks are rare and their payload distribution is substantially different from normal traffic
- Remove training data noise
- Apply the learned models to training data
- Remove anomalous training samples
- Update models
Z-string
- Distribution ordered by Zipf's law can represent the signature
- Z-strings of anomalous payloads from different sites match each other
Limitation
- Curse of dimensionality
- Spurious features
- Not robust against adversaries
- No focused scope
- Easily escaped by mimicry attacks (blending attack)
References
- Anomalous payload-based network intrusion detection, Wang-Stolfo 2004
- CS 259D Session 12