Polymorphic Blending Attacks

Goals

  • Study polymorphic blending attack
  • Why network anomaly IDS based on payload statistics not work
  • Case study: 1-gram and 2-gram PAYL
  • Polymorphic attacks
    • IP/TCP transformations
    • Mutation exploits (Vigna et al.)
    • Fragroute, Whisker, AGENT, Mistfall, tPE, EXPO, DINA, ADMutate, PHATBOT, JempiScodes
    • CLET
  • Defenses against polymorphism
    • Looking for executable code (Toth et al.)
    • Looking for similar structure in multiple code instances (Kruegel et al.)
    • Looking for common substrings present in multiple code instances (Polygraph)
      • Defeated by noise
    • Looking for any exploit of a known vulnerability (Shield)
    • Looking for instruction semantics, detect known code transformations (Cristodorescu et al.)
    • Detect sequence of anomalous system calls (Forest et al.)
      • Can be defeated through mimicry attacks.
      • New approaches use stack information but they can also be defeated
    • Payload-based anomaly detection
      • Use length, character distribution, probabilistic grammar and tokens to model HTTP traffic (Kruegel et al.)
      • Record byte frequency for each port's traffic (PAYL)

Blending Attacks

Scheme of a polymorphic blending attack

Scheme-of-a-Polymorphic-Blending-Attack-PBA

Polymorphism Attacks

Polymorphism

  • Disguise the packets as normal traffic
  • Change the contents of packets to make them look different from each other
  • Exploit code, not used in normal
  • Making attacks look different from each other rather than normal

Three Components of Polymorphic Attack

  • Attack Vector
    • Modified part, invariant part
    • If the attack invariant is very small and exists in the normal traffic
      • Could cause high FP of IDS
  • Attack Body (shellcode)
    • Register shuffling
    • Equivalent instruction substitution
    • Instruction reordering
    • Garbage insertions
    • Encryption
  • Polymorphic Decryptor
    • Apply to decrypt shellcode and transfers control to shellcode
    • Obfuscate decryptor code

Detection of Polymorphic Attacks

  • Exploit code and/or input data contain some characters that have very low probability of appearing in a normal packet
  • This deviation can be detected

Polymorphic Blending Attacks (PBA)

  • Making attacks looks different from each other
  • Adjust their byte frequency to match that of legitimate traffic (look like normal)

Attack Scenario of Polymorphic Blending Attack

  • Assumption of Realistic Attack Scenario
    • The adversary has already compromised a host X inside a network A which communicates with the target host Y inside network B
    • The adversary has knowledge of the IDSB that monitors the victim host network
    • IDS of Network B is a payload statistics based system (e.g., PAYL)
    • The IDSB has a threshold setting that can be adjusted to obtain a desired false positive rate
      • The adversary does not know the exact value of the threshold used by IDSB
      • The adversary has estimated the generally acceptable false positive and false negative rates
  • Adversary's actions
    • Control host X
    • Observe the normal traffic going from X to Y
    • Estimate a normal profile (artificial profile) for this traffic by the same modeling technique as IDSB using
    • Creative mutated instances to match the artificial profile
    • If IDSB fails on analysing the mutated packets, the adversary succeeds in attacking
  • Adversary's trade-off
    • Attack size
      • Monitor network flow size
    • Process speed
      • Attack should be economical in time and space
      • High system resource usage could cause local IDS (e.g., IDSA or host-based IDS) to initiate alerts

Steps of Polymorphic Blending Attacks

Step 1 Learning The IDS Normal Profile

  • Sniffing the network traffic
  • Generates artificial profile
  • More normal packets captured more closer to the normal profile
  • Profile
    • Maximum, average size of packets
    • Rate of packets
    • Byte frequency distribution
    • Range of tokens at different offsets

Step 2 Attack Body Encryption

  • To match the normal profile
    • Substituting every character
    • Padded with some garbage data
  • Reversible operation
  • Generated suitable substitution table

Step 3 Generate Polymorphic Decryptor

  • Remove padding
  • re-substitute characters
  • The decryptor routine is not encrypted but be mutated

Attack Packet Design

  • Find a normal profile that similar to the attack packet
    • Character frequency (substitution table)
      • If no significantly different between new and old substitution tables, replace old by new
    • Packet length
      • Longer than the attack packet
      • Plan B: divide attack body into multiple small packets

Evading PAYL

Evading 1-gram

  • Minimize the maximum frequency difference

Padding

  • ω^\hat{\omega}, substituted attack body before padding
  • ω´\acute{\omega}, substituted attack body after padding
  • ω\|\omega\|, length of ω\omega
  • ii, index of characters
  • xx, characters
    • xix_i, the ithi^{th} of the characters
  • λ\lambda, #\# occurrences
    • λi\lambda_i, #\# occurrences of xix_i
  • ω´=ω^+i=1nλi\|\acute{\omega}\| = \|\hat{\omega}\| + \sum_{i=1}^n{\lambda_i}
  • f(xi)f(x_i), relative frequency of character xix_i in normal traffic
  • f^(xi)\hat{f}(x_i), relative frequency of character xix_i in substituted attack traffic
  • λi=ω´f(xi)ω^f^(xi)\lambda_i = \|\acute{\omega}\|f(x_i) - \|\hat{\omega}\|\hat{f}(x_i)
  • For some characters, f(xi)<f^(xi)f(x_i) < \hat{f}(x_i)
    • The most frequent such character need not be padded
  • Let δ=max(f^(xi)/f(xi))\delta = \max{(\hat{f}(x_i) / f(x_i))} be the maximum overuse
    • λi=ω^(δf(xi)f^(xi))\lambda_i = \|\hat{\omega}\|(\delta f(x_i) - \hat{f}(x_i))

Substitution

  • To minimize padding we need to minimize δ\delta
  • Case 1: attack chars are less numerous than legitimate chars
    • A greedy algorithm that generates one-to-many mapping
    • Sort characters by frequency in attack and legitimate traffic
    • Match frequencies in decreasing order
    • Remaining legitimate characters are assigned to attack characters that have highest to bring it down
  • Case 2: attack chars are more numerous than legitimate chars
    • A greedy algorithm that generates many-to-one mapping
    • Construct a Huffman tree where leaves are characters in the attack traffic, and smallest two nodes are iteratively connected (thus most frequent characters have shortest length)
    • We must choose (not random) the labels for the edges so to preserve the original legitimate character frequency
      • Sort vertices in the tree by weight
      • Sort legitimate characters by their frequency
      • Choose the highest frequency character for the highest weight vertex
      • Remove the vertex from the list and remove the given portion of the character's frequency from further consideration
      • Resort the characters

Evading 2-gram

  • Must match all 2-byte pairs
  • Represent valid 2-grams as states in FSM
  • A simple approach will enumerate valid paths in FSM and map attack characters to paths randomly but this generates large code size
    • Better mapping can be obtained by using entropy information, i.e., mapping frequent characters to short paths
  • Another approach will attempt to find single byte mappings so that 2-grams are also matched
    • Greedy algorithm sorts 2-grams by frequencies in legitimate and attack traffic and matches them greedily taking care not to violate any existing mappings
  • Generate padding so to match the target distribution greedily

Experiment Setup

Attack Vector

  • Windows Media Services (MS03-022)
    • Exploits a vulnerability with logging of user requests
  • Attacker vector 99 bytes
    • Presented at the start of the HTTP request
  • For buffer overflow attack must send 10KB of data
  • Attack body opens a TCP connection and sends registry files
  • Size of attack body is 558B and contains 109 unique characters
  • Attack was divided into multiple packets
    • Divide decryptor into several packets
  • If final blending attack packet after padding not up to 10KB, send normal packets

Dataset

  • Captured 15 days of HTTP traffic
    • From one department
    • 14 days' traffic to train the IDS (4,356,565 packets, 1.9GB)
    • Last day's traffic is used by the attacker to learn character distributions
    • Only TCP data packets are used that do not contain known attacks
  • IDS builds profiles per packet length
  • Selected three frequent packet sizes for the attack

Evaluation

PAYL Training

  • PAYL training time increases with the size of the training data because new packets carry more unique n-grams

Traditional Polymorphic Attacks (CLET)

  • Tested CLET-generated polymorphic attacks against PAYL
    • CLET only adds padding to match byte frequency
    • Other polymorphic engines perform worse than CLET against PAYL
    • CLET attack sequence will avoid PAYL detection only if all packets have an anomaly score above the threshold
    • Both 1-gram and 2-gram PAYL detected all attacks with chosen threshold setting

Polymorphic Blending Attack

Artificial Profile

  • Training of the artificial profile is stopped when there is no significant improvement over existing profile (measured using Manhattan distance) within two packets

1-gram and 2-gram Attacks

  • For 1-gram attacks used one-to-one substitution cipher
  • For 2-gram attacks used single byte encoding scheme
    • Two types of transformations were tested
    • Global substitution
      • Substitution table is constructed for entire attack body
      • Single decoding table used to decode the whole attack flow
    • Local substitution
      • Substitution table is constructed for each packet separately
      • Each packet corresponds to a decoding table
      • Padding space reduced
    • If attack characters are more numerous than those in legitimate traffic, non-existing characters were used
  • 2-gram IDS had consistently higher anomaly scores for attacks but it also had higher thresholds to avoid false positives
    • Overall similar performance as 1-gram IDS
    • More costly for IDS
  • Local substitution always outperformed global substitution

Countermeasures of IDS

  • IDS Models
    • More complex models
    • Blend more models
  • Features
    • Syntactic and semantic information
  • High speed hardware
  • Measure randomness

Limitations and Improvements

  • PAYL is the only case study
  • Are the assumption of PBA are realistic
  • Explore techniques for continuous data streams

References

results matching ""

    No results matching ""