PAYL: Anomalous Payload-based Network Intrusion Detection

Background Knowledge and Goals

  • Existing IDSs failed on
    • 0-day
    • Slow and stealthy worm propagation
  • Detect first occurrences of zero-day worms or new malicious codes delivered via network
    • Signatures not effective
    • Slow/stealthy worm propagation can avoid bursts in network traffic flows or probes
    • Requires payload based detection

Data

  • 1999 DARPA IDS dataset
  • CUCS dataset
  • Data units
    • Full packet
    • First 100 bytes of packet
    • Last 100 bytes of packet
    • Full connection
    • First 1000 bytes of connection

Feature

  • Mean and variance of each frequency

PAYL

  • Design criteria and operating objectives
    • Hands-free
    • Generality for any service or system
    • Incremental update to accommodate changing or drifting environments
    • Low FP
    • High bandwidth environments, low latency, efficient real-time operation

Length Conditioned n-Gram Payload Model

  • Cluster streams
    • Port number
      • Proxy for application
        • 21, 22, 80, etc.
    • Packet length range
      • Proxy for type of payload
        • Larger payloads contain media or binary data
    • Direction of stream
      • Inbound
      • Outbound
  • Measurement: n-gramn \text{-gram} frequencies
    • Given packet length LL, frequency =#= \# of occurrences /(Ln+1)/ (L-n+1)
    • Use n=1n = 1: 256 ASCII characters
  • ii, observed length bin
  • jj, port number
  • MijM_{ij}, average byte frequency and the standard deviation of each byte's frequency

Simplified Mahalanobis Distance

  • Simplifications
    • Naïve assumption: Byte frequencies independent
      • Covariance matrix becomes diagonal
    • Replace 2-norm with 1-norm
      • Avoid time-consuming square and square-root computations
    • Add smoothing factor = 0.001
      • Avoid zero SD and infinite distance
      • Avoid same frequency
      • Reflect statistical confidence of sampled training data
        • larger smoothing factor, less confidence
  • xx, feature vector of the new observation
  • y\overline{y}, averaged feature vector computed from the training dataset
  • σ\overline{\sigma}, standard deviation
  • d(x,y)=i=0n1xiyi/(σ+α)d(x, \overline{y}) = \sum_{i = 0}^{n - 1}{|x_i - \overline{y_i}| / ({\overline{\sigma} + \alpha})}
  • Intuitive Explanation
    • A kind of "normalization" observation examples by "length bin", and summing the results
    • For each dimension (observed length bin, ii)
      • Measure the Manhattan distance between a sample data and the center point
      • Normalize with smoothed SD
    • Summation

Incremental Learning

  • Adapt to concept Drift
  • Use streaming measurements for mean and standard deviation
  • x=x+(xN+1x)/(N+1)\overline{x} = \overline{x} + {(x_{N + 1} - \overline{x})} / {(N + 1)}
  • Store the average of xi2x_i^2, 256-element array

Reduced Model Size by Clustering

  • Fine-grained model problem
    • Large total size of model
      • Similar distributions for near lengths
    • Insufficient training data for some lengths
  • Solution
    • Merge neighboring models
      • Manhattan distance
    • Borrow data from neighboring bins
  • For lengths not observed in training data
    • Use closest length range
    • Alert on unusual length

Unsupervised learning

  • Assumption: Attacks are rare and their payload distribution is substantially different from normal traffic
  • Remove training data noise
    • Apply the learned models to training data
    • Remove anomalous training samples
    • Update models

Z-string

  • Distribution ordered by Zipf's law can represent the signature
  • Z-strings of anomalous payloads from different sites match each other
    • A worm has appeared

Limitation

  • Curse of dimensionality
  • Spurious features
  • Not robust against adversaries
  • No focused scope
  • Easily escaped by mimicry attacks (blending attack)

References

  • Anomalous payload-based network intrusion detection, Wang-Stolfo 2004
  • CS 259D Session 12

results matching ""

    No results matching ""