PAYL: Anomalous Payload-based Network Intrusion Detection

Background Knowledge and Goals

Existing IDSs failed on
- 0-day
- Slow and stealthy worm propagation
Detect first occurrences of zero-day worms or new malicious codes delivered via network
- Signatures not effective
- Slow/stealthy worm propagation can avoid bursts in network traffic flows or probes
- Requires payload based detection

Cluster streams
- Port number
  - Proxy for application
    - 21, 22, 80, etc.
- Packet length range
  - Proxy for type of payload
    - Larger payloads contain media or binary data
- Direction of stream
  - Inbound
  - Outbound
Measurement: $n \text{-gram}$ $n -gram$ frequencies
- Given packet length $L$ , frequency $= \#$ of occurrences $/ (L-n+1)$
- Use $n = 1$ : 256 ASCII characters
$i$ , observed length bin
$j$ , port number
$M_{ij}$ , average byte frequency and the standard deviation of each byte's frequency

Simplifications
- Naïve assumption: Byte frequencies independent
  - Covariance matrix becomes diagonal
- Replace 2-norm with 1-norm
  - Avoid time-consuming square and square-root computations
- Add smoothing factor = 0.001
  - Avoid zero SD and infinite distance
  - Avoid same frequency
  - Reflect statistical confidence of sampled training data
    - larger smoothing factor, less confidence
$x$ , feature vector of the new observation
$\overline{y}$ , averaged feature vector computed from the training dataset
$\overline{\sigma}$ , standard deviation
$d(x, \overline{y}) = \sum_{i = 0}^{n - 1}{|x_i - \overline{y_i}| / ({\overline{\sigma} + \alpha})}$
Intuitive Explanation
- A kind of "normalization" observation examples by "length bin", and summing the results
- For each dimension (observed length bin, $i$ $i$ )
  - Measure the Manhattan distance between a sample data and the center point
  - Normalize with smoothed SD
- Summation

Fine-grained model problem
- Large total size of model
  - Similar distributions for near lengths
- Insufficient training data for some lengths
Solution
- Merge neighboring models
  - Manhattan distance
- Borrow data from neighboring bins
For lengths not observed in training data
- Use closest length range
- Alert on unusual length

Assumption: Attacks are rare and their payload distribution is substantially different from normal traffic
Remove training data noise
- Apply the learned models to training data
- Remove anomalous training samples
- Update models

Distribution ordered by Zipf's law can represent the signature
Z-strings of anomalous payloads from different sites match each other
- A worm has appeared