Example: Data Mining for the NBA

Example: Data Mining for the NBA

Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Application Forensics November 5, 2008 Outline Email Forensics UTD work on Email worm detection - revisited Mobile System Forensics Note: Other Application/systems related forensics - Database forensics, Network forensics (already discussed) Papers to discuss November 10, 2008 and November 17, 2008 Reference: Chapters 12 and 13 of text book

Optional paper to read: http://www.mindswap.org/papers/Trust.pdf - Email Forensics Email Investigations Client/Server roles Email crimes and violations Email servers Email forensics tools Email Investigations Types of email investigations - Emails have worms and viruses suspicious emails - Checking emails in a crime homicide

Types of suspicious emails - Phishing emails i- they are in HTML format and redirect to - suspicious web sites Nigerian scam Spoofing emails Client/Server Roles Client-Server architecture Email servers runs the email server programs example Microsoft Exchange Server Email runs the client program example Outlook Identitication/authntictaion is used for client to access the server Intranet/Internet email servers

Intranet local environment Internet public: example: yahoo, hotmail etc. - Email Crimes and Violations Goal is to determine who is behind the crime such as who sent the email Steps to email forensics Examine email message Copy email message also forward email View and examine email header: tools available for outlook and other email clients Examine additional files such as address books Trace the message using various Internet tools Examine network logs (netflow analysis) Note: UTD Netflow tools SCRUB are in SourceForge

- Email Servers Need to work with the network administrator on how to retrieve messages from the server Understand how the server records and handles the messages How are the email logs created and stored How are deleted email messages handled by the server? Are copies of the messages still kept? Chapter 12 discussed email servers by UNIX, Microsoft, Novell Email Forensics Tools Several tools for Outlook Express, Eudora Exchange, Lotus

notes Tools for log analysis, recovering deleted emails, Examples: AccessData FTK FINALeMAIL EDBXtract MailRecovery - Worm Detection: Introduction

- What are worms? Self-replicating program; Exploits software vulnerability on a victim; Remotely infects other victims Evil worms Severe effect; Code Red epidemic cost $2.6 Billion Goals of worm detection Real-time detection Issues Substantial Volume of Identical Traffic, Random Probing

Methods for worm detection Count number of sources/destinations; Count number of failed connection attempts Worm Types Email worms, Instant Messaging worms, Internet worms, IRC worms, Filesharing Networks worms Automatic signature generation possible EarlyBird System (S. Singh -UCSD); Autograph (H. Ah-Kim - CMU) Email Worm Detection using Data Mining Task: given some training instances of both normal and viral emails, induce a hypothesis to detect viral emails. We used: Nave Bayes SVM

Outgoing Emails The Model Test data Feature extraction Machine Learning Classifier Training data Clean or Infected ? Assumptions Features are based on outgoing emails.

Different users have different normal behaviour. Analysis should be per-user basis. Two groups of features - Per email (#of attachments, HTML in body, text/binary attachments) - Per window (mean words in body, variable words in subject) Total of 24 features identified Goal: Identify normal and viral emails based on these features

Feature sets - - Per email features Binary valued Features Presence of HTML; script tags/attributes; embedded images; hyperlinks; Presence of binary, text attachments; MIME types of file attachments Continuous-valued Features Number of attachments; Number of words/characters in the subject and body Per window features Number of emails sent; Number of unique email recipients; Number of unique sender addresses; Average number of

words/characters per subject, body; average word length:; Variance in number of words/characters per subject, body; Variance in word length Ratio of emails with attachments Data Mining Approach Test instance Clean/ Infected Classifier Test instance SVM

infected ? Clean ? Nave Bayes Clean/ Infected Clean Data set Collected from UC Berkeley. -

Contains instances for both normal and viral emails. Six worm types: - bagle.f, bubbleboy, mydoom.m, mydoom.u, netsky.d, sobig.f Originally Six sets of data: - training instances: normal (400) + five worms (5x200) testing instances: normal (1200) + the sixth worm (200) Problem: Not balanced, no cross validation reported Solution: re-arrange the data and apply cross-validation

Our Implementation and Analysis Implementation - Nave Bayes: Assume Normal distribution of numeric and real data; smoothing applied - SVM: with the parameter settings: one-class SVM with the radial basis function using gamma = 0.015 and nu = 0.1. Analysis -

NB alone performs better than other techniques - The feature-based approach seems to be useful only when we have SVM alone also performs better if parameters are set correctly mydoom.m and VBS.Bubbleboy data set are not sufficient (very low detection accuracy in all classifiers) identified the relevant features gathered enough training data Implement classifiers with best parameter settings Mobile Device/System Forensics Mobile device forensics overview Acquisition procedures Summary

Mobile Device Forensics Overview What is stored in cell phones - Incoming/outgoing/missed calls - Text messages - Short messages - Instant messaging logs - Web pages - Pictures - Calendars - Address books - Music files - Voice records Mobile Phones Multiple generations

- Analog, Digital personal communications, Third generations (increased bandwidth and other features) Digital networks CDMA, GSM, TDMA, - - Proprietary OSs SIM Cards (Subscriber Identity Module) Identifies the subscriber to the network Stores personal information, addresses books, etc. PDAs (Personal digital assistant) Combines mobile phone and laptop technologies - Acquisition procedures Mobile devices have volatile memory, so need to retrieve RAM before losing power Isolate device from incoming signals

Store the device in a special bag Need to carry out forensics in a special lab (e.g., SAIAL) Examine the following Internal memory, SIM card, other external memory cards, System server, also may need information from service provider to determine location of the person who made the call - Mobile Forensics Tools Reads SIM Card files Analyze file content (text messages etc.) Recovers deleted messages Manages PIN codes Generates reports Archives files with MD5, SHA-1 hash values Exports data to files

Supports international character sets Papers to discuss: November 10, 2008 FORZA Digital forensics investigation framework that incorporate legal issues http://dfrws.org/2006/proceedings/4-Ieong.pdf A cyber forensics ontology: Creating a new approach to studying cyber forensics http://dfrws.org/2006/proceedings/5-Brinson.pdf Arriving at an anti-forensics consensus: Examining how to define and control the anti-forensics problem http://dfrws.org/2006/proceedings/6-Harris.pdf - Papers to discuss November 17, 2008 Forensic feature extraction and cross-drive analysis

- http://dfrws.org/2006/proceedings/10-Garfinkel.pdf md5bloom: Forensic file system hashing revisited (OPTIONAL) http://dfrws.org/2006/proceedings/11-Roussev.pdf Identifying almost identical files using context triggered piecewise hashing (OPTIONAL) http://dfrws.org/2006/proceedings/12-Kornblum.pdf A correlation method for establishing provenance of timestamps in digital evidence http://dfrws.org/2006/proceedings/13-%20Schatz.pdf -

Recently Viewed Presentations

  • Surplus Property Federal Surplus Property Donation Program June

    Surplus Property Federal Surplus Property Donation Program June

    Donation "with a service charge," not a sale. Conditional title transfer. Restrictions. ... (Older American Act, '65) Veterans Service Organizations approved by the VA. SASP Process. ... GA Department of Administrative Services. Surplus Division.
  • Rhetorical Devices

    Rhetorical Devices

    Antithesis. A rhetorical term for the juxtaposition of contrasting ideas in balanced phrases or clauses "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it...
  • Public Access Laws in Indiana

    Public Access Laws in Indiana

    Piecemeal disclosures. Three Categories of Public Records. Must be disclosed. Confidential. Released at the discretion of the public agency. ... that are expressions of opinion or are of a speculative nature, and that are communicated for the purpose of decision...
  • Presentación de PowerPoint - Barcelona

    Presentación de PowerPoint - Barcelona

    Aconseguirque les entitatsi organitzacions de l'AcordCiutadàcoordinin els seus plans d'acció en el marc de l'EstratègiaCompartida. 1
  • Funkcje ruchowe - fuw.edu.pl

    Funkcje ruchowe - fuw.edu.pl

    This behavior was called sham rage (pl. furia pozorna) because it had no obvious target. Cannon and Bard showed that transection (B) removing the forebrain and leaving the hypothalamus produced sham rage, while after transection (C) below the hypothalamus the...
  • "Animal-Like" Protists: - Emerald Meadow Stables

    "Animal-Like" Protists: - Emerald Meadow Stables

    Phylum Sarcodina - Sarcodines Best known sarcodina is amoeba Move and feed by use of pseudopods - temporary projections of cytoplasm No body shape, the "blob" Move by amoeboid movement - cytoplasm of the cell streams into the pseudopod, and...


    Parental responsibility laws should strike a balance between using parental liability to force parents to control their children, making victims whole, and holding juveniles personally accountable for their actions.
  • 고객관계관리와 콜 센터 - cfile227.uf.daum.net

    고객관계관리와 콜 센터 - cfile227.uf.daum.net

    Chapter10. 고객관계관리와 콜센터