The Security of Machine Learning
Background Knowledge and Goal
- Evaluate the quality of a learning system
- Determine whether it satisfies requirements for secure learning
- Computer security evaluation
- Determining classes of attacks on the system
- Evaluating the resilience of the system against those attacks
- Strengthening the system against those classes of attacks
- Use same model to evaluate secure learning
Framework
Security analysis
- Security goals
- Integrity goal
- Availability goal
- Threat model
- Attacker goal/incentives
- Cost function
- Cost of defender corresponds to benefit of attacker
- Cost function
- Attacker capabilities
- Attackers have knowledge of machine learning
- Training algorithm
- Training dataset
- Attackers can generate arbitrary instances, or not
- Attackers can control training data, to what extent
- Attackers annnot control
- Label by hand labeled
- Arrival order of packets
- Attackers have knowledge of machine learning
- Attacker goal/incentives
Taxonomy
- Influence (capability, influence training data or not)
- Causative, attacks influence learning with control over training data
- Exploratory, attacks exploit misclassifications but do not affect training
- Security violation (type)
- Integrity, attacks compromise assets via false negatives
- Availability, attacks cause denial of service, usually via false positives
- Specificity (specific intention)
- Targeted, attacks focus on a particular instance
- Indiscriminate, attacks encompass a wide class of instances
- Influence determines structure of the game and move sequence
- Security violation and specificity determine cost function
Notation
Adversarial learning game
- Exploratory game
- Defender Choose procedure for selecting hypothesis
- Attacker Choose procedure for selecting distribution
- Construct an unfavorable evaluation distribution concentrating probability mass on high-cost instances
- Causative game
- Defender Choose procedure for selecting hypothesis
- Attacker Choose procedures and for selecting distributions
- Defenders' trade-off
- Better performance on worst-case vs Less effective on average
Learn, Attack and Defend
"Defender" is the "Learner"
Attack: Causative and Integrity
- Contamination in PAC learning (Kearns and Li 1993)
- PAC, probably approximately correct
- Learner
- Success with
- Attacker
- Control over training data with fraction
- Prevent the learner form succeeding if
- Spam foretold
- Attacker
- Send non-spam resembling the desired spam
- eg: polysemous words. "watch"
- Learner
- Mis-train: Misses eventual spam(s)
- Attacker
- Red herring (Newsome et al. 2006)
- Attacker
- Introduce spurious features into all malicious instances used by defender for training
- At attack time, malicious instances lack the spurious features and bypass the filter
- Learner
- Learn spurious features as necessary elements of malicious behavior
- Attacker
Attack: Causative and Availability
- Rogue filter (Nelson et al. 2008)
- Attacker
- Send spam resembling benign messages
- Include both spam words and benign words
- Send spam resembling benign messages
- Learner
- Associates benign words with spam
- Attacker
- Correlated outlier (Newsome et al. 2006)
- Attacker
- Add spurious features to malicious instances
- Learner
- Filter blocks benign traffic with those features
- Attacker
- Allergy attack (Chung and Mok 2006, 2007)
- Against Autograph, a worm signature generation system
- Learner
- Phase I, Identify infected nodes, based on behavioral patterns
- Phase II, Learn new blocking rules by observing traffic from infected nodes
- Attacker
- Phase I, Convince Autograph that an attack node is infected by scanning
- Phase II, Send crafted packets from attack node, causing Autograph to lean rules blocking begin traffic (DoS)
Attack: Exploratory and Integrity
- Shifty spammer, good word attacks (Lowd and Meek 2005b and Wittel and Wu 2004)
- Attacker
- Craft spam so as to evade classifier without direct influence over the classifier itself
- Exchange common spam words with less common synonyms
- Add benign words to sanitize spam
- Craft spam so as to evade classifier without direct influence over the classifier itself
- Attacker
- Polymorphic blending (Fogla and Lee 2006)
- Attacker
- Encrypt attack traffic so it appears statistically identical to normal traffic
- Attacker
- Mimicry attack (Tan et al. 2002)
- Example: attacking sequence-based IDS
- Shortest malicious subsequence longer than IDS window size
- Example: attacking sequence-based IDS
- Feature drop (FDROP, Amir Globerson and Sam Roweis 2006)
- Reverse engineering (Lowd and Meek 2005a)
- Attacker
- Seeks the worst case for classifier, best case for attacker
- Attacker
Attack: Exploratory and Availability
- Mistaken identity (Moore et al. 2006)
- Attacker
- Interfere with legitimate operation without influence over training
- Create a spam campaign with target's email address as the From: address of spams
- Flood of message bounces, vacation replies, angry responses, etc. fill target's inbox
- Interfere with legitimate operation without influence over training
- Attacker
- Spoofing
- Learner
- IPS trained on intrusion traffic blocks hosts that originate intrusions
- Attacker
- Attack node spoofs legitimate host's IP address
- Learner
- Algorithmic complexity (Dredze et al. 2007; Wang et al. 2007)
- Attacker
- Sending spams embedded in images
- Attacker
Defend: Against Exploratory Attacks
Defenses against attacks without probing
- Training data
- Defender
- Limit information accessible to attacker
- Defender
- Feature selection
- Defender
- Example: use inexact string matching in feature selection to defeat obfuscation of words in spams
- Avoid spurious features
- Regularization: smooth weights, defend against feature deletion
- Defender
- Hypothesis space/learning procedures
- Defender
- Complex space harder to decode, but also harder to learn
- Regularization: balance complexity and over-fitting
- Defender
Defenses against probing attacks
- Analysis of reverse engineering
- Attacker
- Do not need to model the classifier explicitly
- Find lowest-attacker-cost instance
- Defender
- Adversarial classifier reverse engineering (ACRE)
- ACRE find the lowest-attacker-cost
- Attacker
- Randomization
- Defender
- Random decision instead of binary decision
- Defender
- Limiting/misleading feedback
- Defender
- Eliminating bounce emails
- Sending fraudulent feedback
- Defender
Defend: Against Causative Attacks
- Data sanitization
- Reject On Negative Impact (RONI)
- Learner
- Train two classifier, by whether including a certain instance
- Measure the accuracy of them
- Determine the instance is malicious, or not
- Reject the instance as detrimental in its effect
- Robustness
- Robust Statistics, Boiling frog defense
- Mean, Mean Squared Error(MSE), Standard Deviation
- Median, Median Absolute Deviation(MAD)
- Proper regularization
- Robust Statistics, Boiling frog defense
- Online prediction with experts
- Multi-classifier systems
Case Study: SpamBayes
(Targeted or Indiscriminate) | Integrity | Availability |
---|---|---|
Causative | Spam foretold: mis-train a or any particular spam | Rogue filter: mis-train filter to block a certain or arbitrary normal email |
Exploratory | Shifty spammer: obfuscate a or any chosen spam | Unwanted reply: flood a particular or any of several target inbox |
- Method
- Send attack emails with legitimate words
- Legitimate words receive higher spam scores
- Future legitimate emails more likely filtered
- Types
- Indiscriminate: Dictionary attack
- Targeted: Focused attack
- Goals
- Get target to disable spam filter
- DoS against a bidding competitor
References
- The security of machine learning, 2010
- CS 259D Session 10