Adaptive Intrusion Detection System via Online Machine Learning

Background Knowledge and Insights

  • Diverse network environments
  • Dynamic attack types
  • Adversarial environment
  • IDS performance strongly depends on chosen classifier
    • Misuse IDS
      • Detect known attacks
      • SNORT: signature-based IDS
    • Anomaly IDS
      • Detect novel attacks
      • More FP
    • Perform differently in different environments
    • No Free Lunch Theorem
  • Combination IDSs
    • High cost
    • Majority voting
    • Incorrect
      • Majority can be wrong
  • Hedge/Boosting
    • Online learning framework
    • Chose best IDS in period T
    • Destroy the superiority of the best IDS by changing the attacks all the time
    • All IDSs perform badly
  • Adaptability of the IDS
    • Adapt to dynamic adversarial environments
    • How to choose the best

Goals

  • New efficient online learning framework
  • Adaptive Intrusion Detection System (A-IDS)

Data Source

  • ECML-PKDD HTTP 2007
    • 50,000 samples, 20% attacks
      • Attacks or Normal
    • Attacks
      • Cross-Site Scripting
      • SQL injection
      • LDAP injection
      • XPATH injection
      • Path traversal
      • Command Execution
      • SSI attacks
  • CSIC HTTP 2010
    • 61,000 samples, 41% attacks
      • Anomalous or Normal
    • Traffic data
      • Realistic
      • Developed for this purpose
      • E-commerce Web APP
      • Apache server
    • Attacks
      • SQL injection
      • Buffer overflow
      • Information gathering
      • CRLF injection
      • XSS
      • Server side include
      • Parameter tampering

Feature

  • 30 features
    • Length
    • Number
    • Max
    • Min
    • Type of header
    • Four types of characters
      • Letters
      • Digits
      • Non-alphanumeric characters
        • Special meanings in programming languages
        • "special" char
      • Others
    • Entropy
    • Programming languages keywords

Adaptive Intrusion Detection System (A-IDS)

  • Base IDS
    • Base classifiers
      • Naïve Bayes
      • Bayes Network
      • Decision Stump
      • RBF Network
    • No assumptions on selection of baseline classifiers
    • 10-folds cross-validation
  • Loss Update
    • Hedge/Boosting algorithm
  • Mixing Update
    • Bousquet and Warmuth's algorithm (2002)
    • Quick recovery property of the IDSs to deal with their favorite data
      • Remembering the past average weight vector
  • Supervised Framework
    • Combine results of base IDSs
    • Receive the true label of the current sample
    • Measure losses between IDS outputs and true label
    • Maintain weights for base IDSs

Mixing Algorithm

Notation

  • TT, #\# instances
  • tt, (t=1,...,T)(t=1,...,T), a certain time or trial
  • nn, #\# base IDSs
  • ii, i{1,2,...,n}i \in \{ 1, 2, ..., n \}, the index of base IDSs
  • xtx_t, a vector of nn IDS's output
    • A-IDS receives xtx_t
    • xt=(xt,1,...,xt,n)x_t = (x_{t,1}, ..., x_{t,n})
      • where xt,i{0(normal),1(attack)}x_{t, i} \in \{0\text{(normal)}, 1\text{(attack)}\}
  • pred(t)pred(t), the prediction of A-IDS
  • yty_t, yt{0,1}y_t \in \{0, 1\}, the true label of the t-tht\text{-th} instance at the time tt
  • LL, loss function
    • For the trial tt, and the i-thi\text{-th} IDS
    • Lt,i=(ytxt,i)2L_{t, i} = (y_t - x_{t, i})^2
    • Measure the discrepancy between the true label and the predictions of the base IDSs
  • vtv_t, the weight vector maintained by the A-IDS
    • vt=(vt,1,vt,2,...,vt,n)v_t = (v_{t,1}, v_{t,2}, ..., v_{t,n})
    • vt,i0v_{t,i} \geq 0
    • i=1nvt,i=1\sum_{i = 1}^n v_{t,i} = 1

Algorithm

  • Parameters
    • η>0\eta > 0, learning rate
    • 0α10 \leq \alpha \leq 1
    • nn
  • Initialization
    • v1=v0m=(1/n,...,1/n,...,1/n)v_1 = v_0^m = (1/n, ..., 1/n, ..., 1/n)
  • For t=1Tt=1 \to T
    • Prediction
      • y^t=vtxt\hat y_t = v_t \cdot x_t
      • pred(t)pred(t):
        • 00, 0y^t0.50 \leq \hat y_t \leq 0.5
        • 11, y^t0.5\hat y_t \geq 0.5
    • Loss Update
      • Find the best IDS from a pool of nn base ones
      • Lt,i=(ytxt,i)2L_{t, i} = (y_t - x_{t, i})^2
      • eηLt,ie^{-\eta L_{t,i}}, a factor
      • vt,im=vt,ieηLt,i/j=1nvt,jeηLt,jv_{t,i}^m = {v_{t,i} e^{-\eta L_{t,i}}} / {\sum_{j=1}^n v_{t,j} e^{-\eta L_{t,j}}}
    • Mixing Update
      • avt=1tq=0t1vqmav_t = \frac{1}{t} \sum_{q=0}^{t-1} v_q^m, average weight vector
      • vt+1=αavt+(1α)vtmv_{t+1} = \alpha av_t + (1-\alpha)v_t^m

Experimental

  • Expert setting (Not Mixing Update)
    • Single intrusion detection cannot detect all types
      • Good performance on special segments
      • Need combination
    • Loss Update
      • η=0.1\eta = 0.1
    • Not Mixing Update
      • vt1,i=vt,imv_{t_1,i} = v_{t,i}^m
      • Combination is not help
      • The performance is almost the same as the best IDS
  • Expert Combining (A-IDS)
    • Simulate adversarial environment
      • Random permutation
    • 10-folds cross-validation
    • Mixing Algorithm
      • η=0.1\eta = 0.1
      • α=0.001\alpha = 0.001
      • Uniform Past update
  • Expert Expanding (A-ExIDS)

Limitation

  • Lose Update is double-edge knife
    • The best vt,iv_{t,i}(Lt,i=0L_{t,i} = 0) controls vt,imv_{t,i}^m
    • Hard to recover if an IDS temporarily performs poorly and then performs well
    • Consequence
      • Slow adaptation about changes in IDSs' performances
      • Attackers can change attack patterns all the times

References

  • Adaptive Intrusion Detection System via Online Learning, 2012
  • CS 259D Session 10

results matching ""

    No results matching ""