Measuring Serendipity: Connecting People, Locations and ...
Mosaic: Quantifying Privacy Leakage in Mobile Networks Aleksandar Kuzmanovic EECS Department Northwestern University http://networks.cs.northwestern.edu Buy 1 Get 1 Free Mosaic Joint work with Ning Xia (Northwestern University) Han Hee Song (Narus Inc.) Yong Liao (Narus Inc.) Marios Iliofotou (Narus Inc.) Antonio Nucci (Narus Inc.) Zhi-Li Zhang (University of Minnesota)
Synthoid Joint work with Marcel Flores (Northwestern University) 2 Scenario Different footprints Different services IP1 IPK ISP A IP1 IPK CSP B Dynamic IP, CSP/ISP
Different devices 3 Problem Other research work IP1 IPK ISP A IP1 IPK CSP B Input packet traces Tessellation Mosaic We are here!
How much private information can be obtained and expanded about end users by monitoring network traffic? 4 Motivation I will know everything about everyone! IP1 IPK ISP A IP1 IPK CSP B Agencies Bad guys
Mobile Traffic: Relevant: more personal information Challenging: frequent IP changes 5 Challenges How to track users when they hop over different IPs? Sessions: Flows(5-tuple) are grouped into sessions time IP1 IP2 time IP3 time With Traffic
Traffic Markers: Identifiers in the traffic that can be used to differentiate users Markers, it is possible to connect the users true identities to their sessions. 6 Datasets Dataset Source Description 3h-Dataset CSP-A Complete payload
9h-Dataset CSP-A Only HTTP headers Ground Truth Dataset CSP-B Payload & RADUIS info. 3h-Dataset: main dataset for most experiments 9h-Dataset: for quantifying privacy leakage Ground Truth Dataset: for evaluation of session attribution RADIUS: provide session owners 7 Methodology Overview
IPK ISP A IP1 IPK CSP B IP1 Tessellation Via traffic markers Traffic attribution Mapping from sessions to users Mosaic construction Via activity fingerprinting Network data analysis Public web crawling Combine information from both network data and
OSN profiles to infer the user mosaic. 8 Traffic Attribution via Traffic Markers Traffic Markers: Identifiers in the traffic to differentiate users Key/value pairs from HTTP header User IDs, device IDs or sessions IDs Domain Keywords Category Source osn1.com c_user= OSN User ID Cookies osn2.com
oauth_token=-## OSN User ID HTTP header admob.com X-Admob-ISU Advertising HTTP header pandora.com user_id User ID Cookies google.com
sid Session ID Cookies How can we select and evaluate traffic markers from network data? 9 Traffic Attribution via Traffic Markers OSN IDs as Anchors: The most popular user identifiers among all services Linked to user public profiles OSN Source Session Coverage
OSN1 ID HTTP URL and cookies 1.3% OSN2 ID HTTP header 1.0% Top 2 OSN providers from North America Only 2.3% sessions contain OSN IDs OSN IDs can be used as anchors, but their coverage on sessions is too small 10 Traffic Attribution via Traffic Markers Block Generation: Group Sessions into Blocks
OSN ID Other sessions? time IP1 IP Block IP IP 1 time Session interval Depends on the CSP =60 seconds in our study Block Session group on the same IP from the same user Traffic markers shared by
the same block 99K session blocks generated from the 12M sessions 11 Traffic Attribution via Traffic Markers Culling the Traffic Markers: OSN IDs are not enough Uniqueness: Can the traffic marker differentiate between users? Persistency: How long does a traffic marker remain the same? Uniqueness = 1 No two users will share the same google.com#sid value craigslist.org #cl_b google.com #sid mydas.mobi #mac-id mobclix.com #u pandora.com #user_id
mobclix.com #uid OSN1 ID Traffic markers Persistency ~= 1 The value of Google.com#sid remains the same for the same user nearly all the observation duration We pick 625 traffic markers with uniqueness = 1, persistency > 0.9 12 Traffic Attribution via Traffic Markers Traffic Attribution: Connecting the Dots Tessellation User Ti ( IP 1 )
Same OSN ID IP 2 Same traffic marker IP 3 Traffic markers are the key in attributing sessions to the same user over different IP addresses 13 Traffic Attribution via Activity Fingerprinting What if a session block has no traffic markers? Assumption (Activity Fingerprinting): Users can be identified from the DNS names of their favorite services DNS names: Service classes
Service providers Extracted 54,000 distinct DNS names Classified into 21 classes Search bing, google, yahoo Chat skype, mtalk.googl.com Dating plentyoffish, date E-commerce amazon, ebay Email
google, hotmail, yahoo News msnbc, ew, cnn Picture Flickr, picasa Activity Fingerprinting: Favorite (top-k) DNS names as the users fingerprint 14 Traffic Attribution via Activity Fingerprinting Y(Fi)
Fi : Top k DNS names from user as activity fingerprint : Uniqueness of the fingerprint 1 0.98 0.96 0.94 0.92 0.9 Y-axis: closer to 1, more distinct the fingerprint is k=4 k=5 k=6 k=7 k=8 0 0.2 0.4
0.6 0.8 Normalizedfingerprint DNS namesIDs Normalized 1 X-axis: normalized by the total number of DNS names Mobile users can be identified by the DNS names from their preferred services 15 Traffic Attribution Evaluation Session Correct (Not complete)
RADIUS user Ri (Ground Truth) Ti Tessellation user (Correct?) identified sessions/users Coverage = ----------------------total sessions/users Ti Not correct Tj Ri Rj correctly identified
Accuracy on sessions/users = ------------------Covered Set total identified sessions/users 16 Traffic Attribution Evaluation Evaluation Results 15.70% 2.40% Via activity fingerprinting 43.20% 69.00% 49.80% Via traffic markers 78.60% OSN ID extraction 100.00% 99.30%
96.40% 100.00% 94.50% 92.50% Via activity fingerprinting Via traffic markers OSN ID extraction 17 Construction of User Mosaic Mosaic of Real-World User Alice Sub-classes: Residence, coordinates, city, state, and etc. Least gain Most gain Example MOSAIC with 12 information classes(tesserae):
Information (Education, affiliation and etc.) from OSN profiles Information (Locations, devices and etc.) from user sessions 18 Quantifying Privacy Leakage Leakage from OSN profiles vs. from Network Data OSN profiles provide static user information (education, interests) Analysis on network data provides real-time activities and locations Information from both sides can corroborate to each other Information from OSN profiles and network data complement and corroborate each other Preventing User Privacy Leakage Protect
traffic markers Traffic markers (OSN IDs and etc.) should be limited and encrypted Restrict 3rd parties Third party applications/developers should be strongly regulated Beyond Traffic Encryption Trackers and Information Aggregators 21 Current Approaches (in the Advertising Domain) Block or disrupt ad interaction May disrupt regular site operation Privacy preserving infrastructures Requires participation of ad networks Do Not Track, Opt-out mechanisms
Requires trust in ad networks 22 Synthoid Endpoint User Profile Control User explicitly defines who he wants to be online, i.e., his online profile Synthoid imprints this profile into all possible trackers and information aggregators 23 Ad Network Synthoid 24 Synthoid Performance Volume of Synthoid traffic varied 1% -- 100%
It is possible to completely alter the user profile with small amount of artificial traffic 25 Conclusions Prevalence in the use of OSNs leaves users true identities available in the network Significant portions of flows can be attributed to users, even without any direct identity leaks Ubiquitous encryption is not likely to take place, plus it does not solve the problem Our endpoint user profile control lets users explicitly define their online profiles Works for all possible trackers at once Requires no changes or consent from trackers Currently developing Synthoid for search, mobile,
information aggregators, price discriminators, etc. Questions? Thanks! http://networks.cs.northwestern.edu 28
House & Belongings!!! These were all his earthly belongings, besides a camel, a horse, and an ass and some land which he had acquired in the later part of his life (Bukhari, Muslim, Abu Dawood). House & Belongings!!! Sometimes he...
voice / video / chat / data transfer over IP. Understanding Skype is a challenging task. Closed design, proprietary solutions. Almost everything is encrypted. Uses a P2P architecture. Lot of different flavors . 19/01/2011
Delivering a PowerPoint Presentation. MOAC Lesson 11 (John Wiley & Sons, Inc., 2012) Slide Orientation. The direction that material appears on a page when printed (John Wiley & Sons, Inc., 2012) ... Setting up a Slide Show.
No theory gets it all correct but each theory contributes to a more complete and accurate view of how international relations work. Three levels of causes of war(and other things in IR - Nye metaphor) Deep (or ultimate) causes: "logs...
Purchasing state lottery tickets is reinforced with monetary winnings on a _____ schedule. fixed-interval. variable-interval. fixed-ratio. variable-ratio. Page327. The desire to perform a behavior due to promised rewards or threats of punishment involves: latent learning. extrinsic motivation. partial reinforcement. delayed...
Case Presentation Judith Iwasko Case History Age: 63 years old Left cerebrovascular accident March 2002 Right sided hemiplegia Speech services @ Mercy Hospital and Health Center, dismissed due to plateau Currently receiving services at RFP Prior to CVA… Worked in...