Adaptive Intrusion Detection System via Online Machine Learning
Background Knowledge and Insights
- Diverse network environments
- Dynamic attack types
- Adversarial environment
- IDS performance strongly depends on chosen classifier
- Misuse IDS
- Detect known attacks
- SNORT: signature-based IDS
- Anomaly IDS
- Detect novel attacks
- More FP
- Perform differently in different environments
- No Free Lunch Theorem
- Combination IDSs
- High cost
- Majority voting
- Incorrect
- Hedge/Boosting
- Online learning framework
- Chose best IDS in period T
- Destroy the superiority of the best IDS by changing the attacks all the time
- All IDSs perform badly
- Adaptability of the IDS
- Adapt to dynamic adversarial environments
- How to choose the best
Goals
- New efficient online learning framework
- Adaptive Intrusion Detection System (A-IDS)
Data Source
- ECML-PKDD HTTP 2007
- 50,000 samples, 20% attacks
- Attacks
- Cross-Site Scripting
- SQL injection
- LDAP injection
- XPATH injection
- Path traversal
- Command Execution
- SSI attacks
- CSIC HTTP 2010
- 61,000 samples, 41% attacks
- Traffic data
- Realistic
- Developed for this purpose
- E-commerce Web APP
- Apache server
- Attacks
- SQL injection
- Buffer overflow
- Information gathering
- CRLF injection
- XSS
- Server side include
- Parameter tampering
Feature
- 30 features
- Length
- Number
- Max
- Min
- Type of header
- Four types of characters
- Letters
- Digits
- Non-alphanumeric characters
- Special meanings in programming languages
- "special" char
- Others
- Entropy
- Programming languages keywords
Adaptive Intrusion Detection System (A-IDS)
- Base IDS
- Base classifiers
- Naïve Bayes
- Bayes Network
- Decision Stump
- RBF Network
- No assumptions on selection of baseline classifiers
- 10-folds cross-validation
- Loss Update
- Mixing Update
- Bousquet and Warmuth's algorithm (2002)
- Quick recovery property of the IDSs to deal with their favorite data
- Remembering the past average weight vector
- Supervised Framework
- Combine results of base IDSs
- Receive the true label of the current sample
- Measure losses between IDS outputs and true label
- Maintain weights for base IDSs
Mixing Algorithm
Notation
- T, # instances
- t, (t=1,...,T), a certain time or trial
- n, # base IDSs
- i, i∈{1,2,...,n}, the index of base IDSs
- xt, a vector of n IDS's output
- A-IDS receives xt
- xt=(xt,1,...,xt,n)
- where xt,i∈{0(normal),1(attack)}
- pred(t), the prediction of A-IDS
- yt, yt∈{0,1}, the true label of the t-th instance at the time t
- L, loss function
- For the trial t, and the i-th IDS
- Lt,i=(yt−xt,i)2
- Measure the discrepancy between the true label and the predictions of the base IDSs
- vt, the weight vector maintained by the A-IDS
- vt=(vt,1,vt,2,...,vt,n)
- vt,i≥0
- ∑i=1nvt,i=1
Algorithm
- Parameters
- η>0, learning rate
- 0≤α≤1
- n
- Initialization
- v1=v0m=(1/n,...,1/n,...,1/n)
- For t=1→T
- Prediction
- y^t=vt⋅xt
- pred(t):
- 0, 0≤y^t≤0.5
- 1, y^t≥0.5
- Loss Update
- Find the best IDS from a pool of n base ones
- Lt,i=(yt−xt,i)2
- e−ηLt,i, a factor
- vt,im=vt,ie−ηLt,i/∑j=1nvt,je−ηLt,j
- Mixing Update
- avt=t1∑q=0t−1vqm, average weight vector
- vt+1=αavt+(1−α)vtm
Experimental
- Expert setting (Not Mixing Update)
- Single intrusion detection cannot detect all types
- Good performance on special segments
- Need combination
- Loss Update
- Not Mixing Update
- vt1,i=vt,im
- Combination is not help
- The performance is almost the same as the best IDS
- Expert Combining (A-IDS)
- Simulate adversarial environment
- 10-folds cross-validation
- Mixing Algorithm
- η=0.1
- α=0.001
- Uniform Past update
- Expert Expanding (A-ExIDS)
Limitation
- Lose Update is double-edge knife
- The best vt,i(Lt,i=0) controls vt,im
- Hard to recover if an IDS temporarily performs poorly and then performs well
- Consequence
- Slow adaptation about changes in IDSs' performances
- Attackers can change attack patterns all the times
References
- Adaptive Intrusion Detection System via Online Learning, 2012
- CS 259D Session 10