Investigating the Mutual Impact of the P2P Overlay

Investigating the Mutual Impact of the P2P Overlay

Investigating the Mutual Impact of the P2P Overlay and the AS-level Underlay Committee: Amir Hassan Rasti Prof. Reza Rejaie (Chair) Prof. Virginia Lo Prof. Arthur Farley Prof. David Levin Oral Defense 02/13/20 Amir H Rasti 1 Introduction Internet growth and evolution Size Reach Functionality Bandwidth New application model: P2P P2P Application model Popular among Users

Free Content Sharing Popular for Content distributors ? Traditionally : NO Recently : YES Low cost content distribution Popular among researchers 02/13/20 Scalable, distributed, flexible, resilient Popular for ISPs ? Amir H Rasti 2 ISPs and P2P Growing interest in P2P/overlay-based applications Higher bandwidth links provided to users make P2P more viable

P2P traffic High Volume, Steady Pattern, Upload Demand Tension between application goals (e.g. random connection) and ISP goals (minimizing traffic, and cost) ISP cut or limit the rate of certain traffic P2P apps try to hide Some efforts (such as P4P) provide an interface between ISPs and applications Most prior solutions have a localized view of the problem (e.g. each ISP is trying to resolve its own problem without any notion of global impact) No previous work has looked at the global impact (imposed traffic) of P2P on the network 02/13/20 Challenge: Size and level of distribution in overlay and Underlay Amir H Rasti 3 This dissertation: P2P Overlay and AS-level Underlay

The P2P impact on the network depends on: P2P overlay: connectivity, traffic generation, routing AS-level Underlay: connectivity, hierarchy, routing Key question: What is the global impact of a P2P application on the underlying network? This dissertation answers this question in three parts: Measurement and Characterization of the P2P Applications Understanding the AS-level Underlay Characterizing the impact of P2P traffic on AS-level underlay 02/13/20 Amir H Rasti

4 Talk Outline Introduction Characterizing P2P Overlays Measurement Study on Gnutella Overlay (Chapter III) Sampling Large-scale Overlays (Chapter IV) P2P Performance Evaluation: BitTorrent Case (Chapter V) Characterizing AS-level Underlay AS-level Underlay: Geographical Mapping (Chapter VI) Impact of P2P Overlay on AS-level Underlay (Chapter VII) Conclusion & Future Work 02/13/20 Amir H Rasti 5 Gnutella Background One of the first P2P file sharing applications Participating peers form an overlay

They search for the requested files through the overlay Data transfer is performed out of band Top-level overlay Ultrapeer Leaf 02/13/20 Amir H Rasti 6 Measurement Study on Gnutella Measurement Study on Gnutella Overlay Monitor the popular Gnutella application over 15 months (10/04 to 1/06) during which the user population quadrupled (800k to 3.2M). How Gnutella accommodated this quick population increase keeping performance acceptable? How did the peer connectivity evolve? A. H. Rasti, D. Stutzbach, and R. Rejaie. On the Long-term Evolution of the Two-Tier Gnutella Overlay. In Proc. Global Internet Symposium, Barcelona, Spain, April 2006. 02/13/20

Amir H Rasti 7 Measurement Study on Gnutella Summary We captured 20k snapshots of the top-level overlay and studied: Client Properties Graph Properties Geographical Properties Top-tier Deg. Dist. Population Growth 02/13/20 Amir H Rasti 8 Measurement Study on Gnutella Main Findings

Quick increase in the population began to imbalance the overlay when quick software upgrades fixed the problem. Population has grown while main connectivity features are unchanged Despite the general randomness of the overlay, peers tend to connect to other peers in the same continent. Taking full snapshots of very large overlays is not the best practice. This motivates sampling peers. 02/13/20 Amir H Rasti 9 Talk Outline Introduction Characterizing P2P Overlays Measurement Study on Gnutella Overlay (Chapter III) Sampling in Large-scale Overlays (Chapter IV) P2P Performance Evaluation: BitTorrent Case (Chapter V) Characterizing AS-level Underlay

AS-level Underlay: Geographical Mapping (Chapter VI) Impact of P2P Overlay on AS-level Underlay (Chapter VII) Conclusion & Future Work 02/13/20 Amir H Rasti 10 Sampling Large-scale Overlays Motivation P2P systems are very popular in practice. Millions of simultaneous users. Responsible for up to 70% of Internet traffic Measurement studies aid understanding existing systems and user behavior. Capturing an accurate global snapshot is often infeasible. P2P systems are distributed, large, and rapidly changing. P2P crawlers are likely to capture incomplete or distorted snapshots Sampling is a natural approach for understand peer properties

How can we collect representative samples? 02/13/20 Amir H Rasti 11 Sampling Large-scale Overlays The Graph Sampling Problem We focus on sampling peer properties, such as number of neighbors (degree), access link bandwidth, session time, # files Sampling peer properties has two steps: Discovering and selecting peers (or samples) Measuring the desired properties of selected peers Selecting peers uniformly at random is hard there are two possible sources of bias in a random walk [Stutzbach:IMC06] Topological: high-degree peers are more likely to be selected Temporal: short-lived peers are more likely to be selected Random walks are a promising approach to sampling

The resulting bias is precisely known. 02/13/20 Amir H Rasti 12 Sampling Large-scale Overlays Background: Sampling Using Random Walk Random walks can be described with a transition matrix P(x,y) P(x,y) : probability of moving from x to y 1 P ( x, y ) deg( x) 0 y is a neighbor of x otherwise Pr(x,y) : probability of moving from x to y after r moves Random walks converge to a stationary distribution deg( x) ( x) lim(vP )( x) r 2E r

Problem: we need a uniform distribution 02/13/20 Amir H Rasti 13 Sampling Large-scale Overlays Background: Metropolized Random Walk (MRW) The Metropolis-Hastings method modifies the transition matrix to yield the desired uniform distribution [Stutzbach:IMC06] MRW method: Select a neighbor y of x uniformly at random Transition to y with probability min( deg(x)/deg(y) , 1) Otherwise, self-loop to x. Results in uniform stationary dist. (x)= 1/|V|V|V| deg( x) P( x, y ) min( deg( y ) ,1) if x y Q( x, y ) 1 Q ( x, y )

if x y x y x y MRW compensates for bias as samples are collected 02/13/20 Amir H Rasti 14 Chapter IV Sampling Peer Properties In Large-scale Overlays We present a new graph sampling technique, Respondent-Driven Sampling (RDS) Compare the performance of RDS and MRW sampling techniques using simulations & experiments A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Respondent-driven Sampling for Characterizing Unstructured Overlays. In Proc. IEEE INFOCOM Mini-conference, 2009. 02/13/20 Amir H Rasti

15 Sampling Large-scale Overlays Respondent-Driven Sampling (RDS) A development of Snowball Sampling [Salganik04] Commonly used in social sciences to sample hidden populations, e.g. drug addicts Social relationships (references) are used by sampler to diffuse into hidden populations Each person introduces n other persons in questionnaires Similar to random walk (n = 1) We adopt the RDS technique from social sciences for sampling P2P networks In RDS we perform normal random-walk and compensate for the bias in post-processing. We hope to achieve better performance over MRW at least in certain graph types. 02/13/20

Amir H Rasti 16 Sampling Large-scale Overlays RDS Formulation Goal: Estimate the distribution of node property X Perform regular random walk, collect values of property X and node degree (deg(v)) at each visited node Deal with the bias during the post-processing as follows: Divide possible values for X into several ranges: {R1, . . . ,Rm} Partition nodes with the X value within the same range: {V1, . . . ,Vm} Using Hansen-Hurwitz estimator to compensate for the bias, the fraction of node population lying in group i is estimated as follows: p i 02/13/20 vTi vT

1 1 deg(v) Ti: visited samples in group i T: all visited samples deg(v) Amir H Rasti 17 Sampling Large-scale Overlays Evaluation Overview We evaluate RDS and MRW in comparison in a variety of static/dynamic graphs. Performance metric: Consider only peer properties that may interact with the walk: 1) Peer Degree, 2) Peer Uptime, 3) Peer RTT Compare the dist. of the these peer properties from samples and ground truth using Kolmogorov-Smirnov (KS) statistics: Maximum diff between two CDFs

Evaluation Methodology Evaluation over static graphs Effect of graph structure Evaluation over dynamic graphs (session level simulation) Benefits of parallel Sampling (see the paper) Effect of overlay construction parameters: 1) churn, 2) peer discovery, 3) target peer degree 02/13/20 Experiment over real Gnutella network Amir H Rasti 18 Sampling Large-scale Overlays Evaluation: Static Graphs Using graph models with different degree distribution & clustering characteristics:

Random graphs (ER): Erdos-Renyi Small-world graphs (SW): Watts and Strogatz Scale-free graphs (BA): Barabasi and Albert Hierarchical Scale-Free graphs (HSF): Barabasi 02 Power-law degree distribution Node clustering is inversely proportional to node degree Similarities with many natural and social networks Gnutella graphs (GA): Snapshots of Gnutella Ultrapeer overlay 02/13/20 Amir H Rasti 19 Sampling Large-scale Overlays Hierarchical Scale-Free (HSF) 02/13/20 Amir H Rasti

20 Sampling Large-scale Overlays Static Graphs Accuracy of both techniques is improved with the number of samples in most cases The rate of improvement in accuracy is much lower over HSF especially for MRW Walkers are likely to get trapped within clusters in HSF graphs Leaving a cluster requires visiting high degree nodes but MRW is less likely to visit these nodes Rewiring a small fraction of randomly selected edges in HSF significantly improves accuracy for both techniques RDS is less sensitive to graph clustering than MRW 02/13/20 Amir H Rasti 21

Sampling Large-scale Overlays Dynamic Simulation Setting Session-time distributions: Weibull, Exponential, Pareto Poisson arrival process Peer discovery: Oracle, FIFO, HeartBeat, History Target population: 100000 Min. Degree: 3-30 Sampling Parameters: 02/13/20 Node degree (DEG) Node query latency (RTT) Session length/uptime (UT) Amir H Rasti 22 Sampling Large-scale Overlays RDS over Dynamic Graphs Churn is a primary limiting factor for accuracy

Session len.> 5 min Very good sampling accuracy Churn model has little effect Similar impact on other peer properties (see the paper) Sampling error is small once nodes have sufficient connectivity (deg>5) Lower accuracy for smaller degree is due to graph partitioning Partitioned nodes in History mech. reduce the accuracy of sampling 02/13/20 Amir H Rasti 23 Sampling Large-scale Overlays RDS and MRW over Dynamic Graphs: Effect of Parallelism Too much parallelism does not improve

performance Too long random walks have negative effect Sweet spot exists 02/13/20 Depends on other parameters such as degree and churn Amir H Rasti 24 Sampling Large-scale Overlays Experiment: RDS and MRW over Gnutella Run crawler, 1000 RDS & 1000 MRW walkers in parallel 500 steps per walker Use captured snapshots by crawler as a rough reference Show min, max, avg KS over 6 experiments Focus only on degree dist The degree dist from samples & crawls are very similar

(KS~0.03) The accuracy is an order of magnitude lower than dynamic sim due to inaccurate reference. Both sampling techniques achieve similar accuracy 02/13/20 Amir H Rasti 25 Sampling Large-scale Overlays Summary RDS always performs as good or better than MRW High levels of graph clustering can significantly degrade the accuracy of both RDS and MRW RDS is less sensitive than MRW to graph clustering Total sample size can be controlled by two parameters: Walk length and Number of parallel samplers There is sweet spot for the number of parallel samplers vs. walk length. Too long random walks take too long to complete Too many parallel samplers keep re-sampling the same nodes Poor connectivity (partitioning) & high dynamics (churn) adversely affect the accuracy of both techniques. 02/13/20

Amir H Rasti 26 Talk Outline Introduction Characterizing P2P Overlays Measurement Study on Gnutella Overlay (Chapter III) Sampling Large-scale Overlays (Chapter IV) P2P Performance Evaluation: BitTorrent Case (Chapter V) Characterizing AS-level Underlay AS-level Underlay: Geographical Mapping (Chapter VI) Impact of P2P Overlay on AS-level Underlay (Chapter VII) Conclusion & Future Work 02/13/20 Amir H Rasti 27 Talk Outline

Introduction Characterizing P2P Overlays Measurement Study on Gnutella Overlay (Chapter III) Sampling Large-scale Overlays (Chapter IV) P2P Performance Evaluation: BitTorrent Case (Chapter V) Characterizing AS-level Underlay AS-level Underlay: Geographical Mapping (Chapter VI) Impact of P2P Overlay on AS-level Underlay (Chapter VII) Conclusion & Future Work 02/13/20 Amir H Rasti 28 Geographical Mapping of ASes: Motivation Autonomous Systems: Building blocks of Internet Wide variety of types/sizes/geographical scopes AS1273 (CW), tier-1, has presence in 4 different continents AS3582 (UONet), tier-3, has presence in one city AS characterization/modeling has been the topic of numerous research

works Connectivity [Gao00] Business model [Chang03] Hierarchy [Ge01] Geographical aspects are key to understanding other characteristics Connectivity: Two ASes are likely to connect if: Their coverage areas overlap, i.e. common PoP locations, or Both are present in a number of IXPs Business Two ISPs may compete if they provide services in the same market Traffic demand between two ASes is related to their geo. distance Previous works have not considered geographical properties 02/13/20 Amir H Rasti 29 Geographical Mapping of Eyeball ASes This Chapter Geographical Mapping of Eyeball ASes

Capturing the service area of each eyeball (edge) AS Estimating user density function Determining likely PoP locations Correlation with connectivity A. H. Rasti, N. Magharei, R. Rejaie, and W. Willinger. Eyeball ASes: From Geography to Connectivity. In Proc. ACM Internet Measurement Conference, Melbourne, Australia, 2010. 02/13/20 Amir H Rasti 30 Geographical Mapping of Eyeball ASes Method Overview Sampling end-users: Collect a large number of IP addresses associated with endusers by capturing snapshots of popular P2P applications Mapping end-users to location Using GeoIP database Grouping end-users by AS Estimating geo-footprint of ASes Using KDE tool to estimate user density from captured samples Using peaks of the user density to find likely PoP locations

Why eyeball ASes? How end-users are connected to the Internet 02/13/20 Missing in BGP/traceroute based methods Geo mapping is available with much higher accuracy Amir H Rasti 31 Geographical Mapping of Eyeball ASes Mapping End-users to Location and AS applications: Gnutella, Kad and BitTorrent ~89 million IP addresses captured during 1/2009 to 7/2009 Used two IP geo-location databases: GeoIP City from MaxMind IP2Location DB-15 from Hexasoft Discarding IPs that are not reliably located

If geo data is not available or not consistent Roughly 60% of all IPs pass the filter Map each IP to AS using relevant BGP tables from RouteViews archive 02/13/20 Amir H Rasti Number of Peers (Million) Capture snapshots of 3 popular P2P Continents 32 Geographical Mapping of Eyeball ASes High-level Geo. Categorization of ASes according to their geographical span to city-, state-, country-, continent-level, and global: for l in {city, state, country, continent}

02/13/20 If 95% of IPs in an AS are in the same l call it l-level Number of ASes We categorize ASes Otherwise call it Global Continents Amir H Rasti 33 Geographical Mapping of Eyeball ASes Target Dataset For acceptable sampling, we only include ASes with at least 1000 IPs that pass the geo-location reliability filter That reduces the total number of peers to ~48 M and total number of ASes to 1229

#Peers by source (k) #ASes by level Region Kad Gnu BT City State Country NA 1218 8984 1761 36 162 129 EU

18004 2519 2529 60 76 292 AS 17865 1606 1016 117 35 134 02/13/20 Amir H Rasti 34

Geographical Mapping of Eyeball ASes AS Geo-footprint The previous basic categorization does not provide information about user density and PoP location. We are interested in deriving the geographical user density function for each AS and thus their PoP locations IP geo-location databases have limited accuracy Location data is noisy (noise = mapping error) Users are sampled We dont have all the users; Only those in 3 P2P applications. Problem: Rebuild the signal from noisy samples Kernel Density Estimation (KDE) 02/13/20 Amir H Rasti 35 Geographical Mapping of Eyeball ASes KDE Estimation of a probability density function using a number of

samples. n x xi 1 f ( x) K( ) h nh i 1 h Gaussian Kernel: K ( x) x xi 1 K( ) e h 2 ( x xi ) 2 2h2 h=0.05, 0.3, 2 Source: wikipedia h: kernel bandwidth 02/13/20 Amir H Rasti

36 Geographical Mapping of Eyeball ASes Estimate User Density via KDE Estimate user density over an ASs geographical domain: 02/13/20 Take each IP location as a sample Apply KDE with certain bandwidth over samples from each AS One may try to set the kernel bandwidth in order to cover geo-location error (e.g. estimated median geolocation error) The aggregation and smoothing functionality of KDE is also important and useful. With different settings of KDE bandwidth we will have aggregation and smoothing in different scales Amir H Rasti 37

Geographical Mapping of Eyeball ASes User Density: Multi-scale View AS3269 : Telecom-Italia with 2.2 M samples BW = 60km BW = 40km BW = 20km Kernel bandwidth determines the radius of aggregation. 02/13/20 Amir H Rasti 38 Geographical Mapping of Eyeball ASes Finding PoPs Point of Presence (PoP): Location where an ISP has network equipments, from where they can provide services. Ranges from a couple of routers in a rental rack space, to multiple buildings.

Is important because they are often inter-ISP connection points, also show where they can provide services. To find PoPs from user density function Basic approach: By simply mapping each density peak to the closest city, we often end up with towns and villages. (why?) We pick the city with the highest population that is at most 40km (=bw) far from the peak point. 02/13/20 Requires careful treatment of border cities Amir H Rasti 39 Geographical Mapping of Eyeball ASes PoP location example Country-level AS12322 (PROXAD) in France

~1M samples Pop-list: Paris: 2110694 0.4454 Marseille: 792823 0.0450 Lyon: 463700 0.0362 Toulouse: 411145 0.0295 Nice: 341621 0.0239 Montpellier: 238887 0.0183 Bordeaux: 219311 0.0177 Nantes: 284677 0.0151 Strasbourg: 273506 0.0149 Lille: 189746 0.0148 Nancy: 106857 0.0099 Dijon: 153692 0.0087 Caen: 115358 0.0044 Perpignan: 104783 0.0038 Brest: 150554 0.0031 Poitiers: 86639 0.0026 Limoges: 134549 0.0015 Irn: 57205 0.0005 02/13/20 Amir H Rasti 40 Geographical Mapping of Eyeball ASes Comparison with Ground-Truth

Perc. of reference PoPs matched bw=40km, for the bottom 60% of ASes, < 20% of PoPs matched. Decreasing bw increases matched PoPs Perc. of KDE PoPs matched bw=80km, perfect catch for 60% of ASes, < 20% of PoPs matched. Decreasing bw reduces matched PoPs Using larger kernel bw leads to a smaller but more reliable set of PoPs for most ASes 02/13/20 Amir H Rasti 41 Geographical Mapping of Eyeball ASes Summary We demonstrated possibility of capturing geographic

footprints of eyeball ASes, using measurements at the edge. Geo. footprints may be used to infer infrastructureand/or business-related properties e.g. pop-location, connectivity, market, competence It is complementary to the previous BGP/traceroute based methods in painting the full picture of AS-level Internet. 02/13/20 Amir H Rasti 42 Talk Outline Introduction Characterizing P2P Overlays Measurement Study on Gnutella Overlay (Chapter III) Sampling Large-scale Overlays (Chapter IV) P2P Performance Evaluation: BitTorrent Case (Chapter V) Characterizing AS-level Underlay AS-level Underlay: Geographical Mapping (Chapter VI) Impact of P2P Overlay on AS-level Underlay

(Chapter VII) Conclusion & Future Work 02/13/20 Amir H Rasti 43 Impact of Overlay on Underlay Motivation Growing interest in P2P/overlay-based applications Higher bandwidth links provided to users make P2P more viable Tension between application goals (e.g. random connection) and ISP goals (minimizing traffic, and cost) Important questions: 02/13/20 What is the global impact of a P2P overlay on the underlying network? What are the main factors and how do they shape the above impact? e.g., overlay shape, overlay load distribution, underlay topology, underlying routing policies

Amir H Rasti 44 Impact of Overlay on Underlay Problem & Methodology Problem What is the imposed load by P2P overlay on each link and node of the AS-level topology? Methodology summary Capturing the topology of a P2P overlay Estimating the load on individual connections in the overlay Inferring the AS-paths associated with individual overlay connections Determining the aggregate load on each AS and between connected ASes 02/13/20 Amir H Rasti 45 Impact of Overlay on Underlay Methodology: Overlay Load

Overlay load parameters Number of sources, the rate and pattern of traffic generation by these peers, and their relative location in the overlay Overlay topology Relaying (routing) strategy Overlay traffic model 02/13/20 Any viable traffic model can be used for different P2P applications In flooding overlays edge-betweenness canbe used We used a simple model: All p2p connections carry the same load, therefore the traffic between any pair of edge ASes is simply number of P2P connections Amir H Rasti 46 Impact of Overlay on Underlay Methodology: Inferring AS-level topology Mapping peers to ASes Using RouteViews snapshots to translate the IP address of each peer to its corresponding AS number. Form an inter-AS overlay traffic matrix in which we have the

number of P2P connections between each pair of ASes. Capturing Inter-AS relationships (AS-level topology) CAIDA provides annotated AS-level topology of the Internet including the inferred relationships between each pair of connected ASes. The relationships are inferred based on the valley-free routing principle and have 3 possible categories: 02/13/20 Customer-Provider Peer-Peer Sibling-Sibling Amir H Rasti 47 Impact of Overlay on Underlay Methodology: Inferring AS-paths Inferring AS-Paths by simulating BGP using C-BGP tool

Represent each AS by a single router Map AS relationships into BGP routing policies Shortcomings: Unable to find multiple paths between same pair of ASes. No standard protocol for implementing c-p, p-p, or s-s relationships. Variant from one place to another. Classifying ASes into tiers We use TierClassify [Ge01] tool to classify ASes into tiers. The top ASes with no providers are labeled as tier-1 and their customers become tier-2, and so on. Calculate the aggregate load imposed on each transit AS by summing up the traffic carried by all crossing AS-paths 02/13/20 Amir H Rasti 48 Impact of Overlay on Underlay

Data profile Snapshot Gnutella Snapshots BGP Snapshots AS-Paths #Peers #Conn. #Prefixes #ASes #Unique %Important G-04 177k 1.5M 165k 19k 192k

2.0 G-05 681k 5.8M 185k 21k 384k 2.9 G-06 1.0M 8.6M 210k 23k 605k 2.8 G-07 1.2M

9.8M 229k 25k 684k 2.7 Overlay growth : More than 600% Underlay has grown Path diversity has increased Only 2%-3% of paths carry +100 overlay connections 02/13/20 Amir H Rasti 49 Impact of Overlay on Underlay Results: Load Distribution (1) Number of overlay connections per AS-Path (CCDF)

Number of overlay connections passing through each transit AS 02/13/20 Skewed for all snapshots Small number of AS-Paths carry a large fraction of load E.g. about 10% of AS-Paths each carry 10 connections or more while 1% of AS-Paths carry 200 connections or more Amir H Rasti Similarly skewed for all snapshots For G-07, each of the top-10 ASes carry more than 1M connections while each of the top-100 ASes carry more than 10k connections Similar distributions despite changes in overlay and underlay 50 Impact of Overlay on Underlay

Results: Load Distribution (2) Number of AS-Paths passing through each transit AS Similarly skewed with previous graphs Suggests that the load on transit ASes mostly depends on the number of crossing AS-Paths Scatterplot for number of overlay connections crossing each AS-Path 02/13/20 Amir H Rasti Relates two previous distribution Confirms that the load on each transit AS mostly depends on where it is located in the AS topology 51

Impact of Overlay on Underlay Results: Top transit ASes; Identity & Evolution Top-10 transit ASes carrying the highest number of overlay connections during 2004-2007 4 ASes stay in top-10 all the time Other ASes change location as a result of changes in Peer location AS-level topology change Routing policy change Top-10 ASes load is close to each other 02/13/20 Amir H Rasti 52 Impact of Overlay on Underlay Results: AS-Path length Dist. of AS-path length between

all pairs of edge ASes No significant change 40% of paths are 3 hops long 80% of paths are 4 hops long or shorter Dist. of AS-path length across all overlay connections 02/13/20 Amir H Rasti Weighted version of the top graph with number of overlay connections on each path Similar pattern with shorter average path length For G-07 mean unweighted path length is 3.7 while mean weighted path length is 3.2

A higher fraction of overlay connections are connected via short paths. 53 Impact of Overlay on Underlay Results: Load Propagation over AS hierarchy Snap. Tier-1 Tier-2 Tier-3 Path Conn Path Conn Path Conn G-04 51 84

46 16 2.4 0.0 G-05 59 73 38 27 3.0 0.0 G-06 52 64 38 36

10 0.0 G-07 55 63 41 37 3.6 0.1 Percentage of AS-Paths/overlay connections reaching (peaking at) tier-N. About half of the AS-Paths reach tier-1 and ~40% reach tier-2. Compared to paths, a larger fraction (84%-63%) of overlay connections reach tier-1 A smaller fraction (16% - 37%) of overlay connections reach tier-2 The percentage of connections reaching tier-1 ASes is decreasing and perc. of connections reaching tier-2 is increasing over time. Explanation : Increased peering relationships at lower tiers 02/13/20 Amir H Rasti 54 Impact of Overlay on Underlay

Summary We characterized the load imposed by the Gnutella P2P application to the AS-level topology of the Internet. We listed main challenges and problems. We used best common practices and data (Cruiser, RouteViews, CAIDA, C-BGP) Presented results provide a better understanding of the load of overlays on the AS-level underlay. They suggest that the traffic is getting dispersed from the core of the Internet by the increase in ISP peering. 02/13/20 Amir H Rasti 55 Talk Outline Introduction Characterizing P2P Overlays

Measurement Study on Gnutella Overlay (Chapter III) Sampling Large-scale Overlays (Chapter IV) P2P Performance Evaluation: BitTorrent Case (Chapter V) Characterizing AS-level Underlay AS-level Underlay: Geographical Mapping (Chapter VI) Impact of P2P Overlay on AS-level Underlay (Chapter VII) Conclusion & Future Work 02/13/20 Amir H Rasti 56 Conclusion P2P/overlay characterization Studied Gnutella overlay connectivity and peer properties during 15 months Observed steady connectivity structure despite rapid growth Proposed RDS for sampling large-scale overlays

Studied its performance in static and dynamic cases Extreme graph structure & high churn limit sampling accuracy Characterizing AS-level Underlay Performed Geographic Char. of Eyeball ASes Proposed a viable method to captured geo. footprints of ASes and likely PoP locations Characterizing the Impact of P2P overlay on AS-level underlay 02/13/20 Using measurement and analysis, we captured the traffic imposed by the Gnutella overlay on the AS-level underlay The Internet is becoming more decentralized. Amir H Rasti 57 Future Work Global Traffic Modeling

We built a simple model for P2P traffic A large portion of traffic is flowing from CDNs to users Capture IP addresses of CDNs using a large number of vantage points Build an application-level traffic model between CDNs and end-users Simple: Traffic flows from all CDNs to all eyeball ASes Calculate impact on the underlay Effect of P2P traffic on AS connectivity ASes usually plan and build their networks assuming most of the traffic from data centers or providers to users P2P traffic flows between edge ASes Using a basic cost model one could calculate savings of establishing peering relationships between eyeball ASes. Modeling AS connectivity 02/13/20 ASes decide on connectivity based on many factors including traffic demand, size and geographical location of PoPs

A researcher may build a model deciding the predicting connectivity and its type based on geographical and size info Amir H Rasti 58 Publications A. H. Rasti, D. Stutzbach, and R. Rejaie. On the Long-term Evolution of the Two-Tier Gnutella Overlay. In Proc. Global Internet Symposium, Barcelona, Spain, April 2006. A. H. Rasti and R. Rejaie. Understanding Peer-Level Performance in BitTorrent: A Measurement Study. In Proc. International Conference on Computer Communications and Networks, Honolulu, Hawaii, August 2007. A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Respondent-driven Sampling for Characterizing Unstructured Overlays. In Proc. IEEE INFOCOM Mini-conference, 2009. A. H. Rasti, N. Magharei, R. Rejaie, and W. Willinger. Eyeball ASes: From Geography to Connectivity. In Proc. ACM Internet Measurement Conference, Melbourne, Australia, 2010.

A. H. Rasti, R. Rejaie, and W. Willinger. Characterizing the Global Impact of P2P Overlays on the AS-Level Underlay. In Proc. Passive and Active Measurement Conference, Zurich, Switzerland, April 2010. 02/13/20 Amir H Rasti 59 Acknowledgment Thanks to 02/13/20 My advisor Prof. Reza Rejaie My committee members Prof. Virginia Lo Prof. Art Farley Prof. David Levin

My co-authors Dr. Walter Willinger Dr. Daniel Stutzbach Dr. Nick Duffield Mojtaba Torkjazi Others Prof. Eugene Luks Prof. Matthew Roughan Reza Motamedi CIS Department Faculty and Staff Dr. Nazanin Magharei Amir H Rasti Thank you! 60 BACKUP SLIDES 02/13/20 Amir H Rasti

61 ASGEO: Bias in Collected samples The fraction of collected samples from a city could be disproportional with actual user population per AS Cannot distinguish between market share of an AS in a city and penetration of P2P app in that city Mild bias only affects the density of identified PoPs Significant bias is unlikely with a large number of samples 02/13/20 02/13/20 RezaAmir Rejaie H Rasti 62 62 Sampling Large-scale Overlays Evaluation: Static Graphs : Conclusion Combination of highly skewed degree distribution and highly skewed clustering traps samplers RDS samplers get out of the clusters quickly MRW samplers get stuck in low-degree clusters

Shuffling provides short-cuts out of clusters 02/13/20 Amir H Rasti 63 P2P Performance: BitTorrent BitTorrent Overview Most popular P2P application Scalable one to many peer-to-peer file distribution Overlay: Unstructured, Random, High degree Swarming File is divided into segments Segments are randomly distributed among peers Get rarest seg. first Contribution Peers exchange segments and contribute their outgoing bandwidth Incentive: Tit-for-Tat

Tracker Torrent coordinator Periodic peer status updates Performance: Intuitively depends on Peer properties (BW, Contribution, etc. ) Group properties (Population, Content availability, Churn) Why BitTorrent? 02/13/20 Amir H Rasti 64 P2P Performance: BitTorrent Peer-level Performance in BitTorrent Characterization: Understanding group-level and peer-level properties in a torrent Analysis:

What are the main factors that affect observed performance by individual peers? A. H. Rasti, R. Rejaie. Understanding Peer-level Performance in BitTorrent: A Measurement Study In Proc. International Conference on Computer Communications and Networks (ICCCN), Honolulu, Hawaii, August 2007. 02/13/20 Amir H Rasti 65 P2P Performance: BitTorrent Methodology (Brief) We use tracker logs for three popular torrents and extract all the following properties Peer-level properties Avg. download and upload rates Group-level properties Group population, Avg. content availability, churn rate Define peer performance as avg. download rate over maximum download rate (utilization)

Use a variety of techniques to investigate any correlation between peer performance and other captured properties. 02/13/20 Scatterplot, linear regression, rank correlation Amir H Rasti 66 P2P Performance: BitTorrent Results summary No single factor determines observed performance by peers Outgoing bandwidth seems to have the largest effect Tit-for-tat is working There often appears to be sufficient number of seeds available (non-factor on performance) Performance of the peers in a torrent is rather diverse 02/13/20 Amir H Rasti

67 Geographical Mapping of Eyeball ASes KDE function using a number of samples. n x xi 1 f ( x) K( ) h nh i 1 h Gaussian Kernel: Density Value Estimation of a probability density K ( x) x xi 1 K( ) e h 2

h: kernel bandwidth 02/13/20 X1 ( x xi ) 2 2h2 X2 Example: 500 samples from N(0, 1) X N(0,1) and 500 from N(3.5, 1) X N(0,1) Bivariate KDE with bandwidth=0.36 Amir H Rasti 68 Geographical Mapping of Eyeball ASes Setting KDE Bandwidth Can we use the peaks of the user density function as probable PoPs of each ISP? 02/13/20

To attain the goal of finding PoP locations at city-level, we set the kernel bandwidth to ensure aggregation of high density areas within a citys area. We take the diameters of 10 important metro areas globally and find an average of ~70km. By setting the KDE BW to 40km, high density areas that are within 80km of each other will merge and we will have one peak per city/metro area. Amir H Rasti 69 Geographical Mapping of Eyeball ASes Validation and Comparison A limited number of ISPs provide network maps or PoP lists on their websites. We could find such info. for ~50 ISPs and our PoP list almost completely matched their published lists. Comparison with DIMES project Uses traceroute from participating users Overlap between their dataset and our target AS set includes 226 ASes. Avg. #PoPs/AS : 7.14 from our data and 1.54 from DIMES 02/13/20

Amir H Rasti 70 Geographical Mapping of Eyeball ASes AS-Geo & Connectivity: Case Study The goal is to examine how eyeball ASes are connected to each other according to the geo-footprint and PoP locations Connectivity information derived from: AS relationships data from CAIDA IXP data from IXP mapping project at lip6.fr Refer to the dissertation 02/13/20 Amir H Rasti 71 Geographical Mapping of Eyeball ASes Case Study AS8234 (RAI: Radiotelevisione Italiana) : A city-level AS in Italy 3000 P2P users in Rome

5 providers, 0 customers and 3 peers Country-level providers are: Infostrada, Fastweb Global providers are: Colt, Easynet and BT-Italia Member of Milan IXP (MIX) While RAI is city-level ISP in Rome it is not present in ROME IXP. In MIX peers with 02/13/20 GARR (Italian Academic and Research Network) ASDASD ITGate Peering with ASDASD and ITGate is appealing enough to forgo a cheaper local solution over a more expensive remote peering Amir H Rasti 72 Impact of Overlay on Underlay Methodology: Capturing Overlay Snapshot Gnutella P2P application Uses a two-tier overlay for efficient searching

Top-level peers (UltraPeers) form a highly connected overlay Crawling possibility Cruiser: A tool for crawling P2P overlays High performance parallel crawler infrastructure Captures Gnutella top-level overlay at ~100k peers per minute Outputs the list of peers and their neighbors BitTorrent: More popular Data exchange through overlay No reliable method to capture topology We focus on Gnutella while any overlay can be studied if reliably captured 02/13/20 Amir H Rasti 73

Recently Viewed Presentations

  • AP Chemistry Reactions in Solution solution: a homogeneous

    AP Chemistry Reactions in Solution solution: a homogeneous

    strong electrolyte, weak electrolyte, or . nonelectrolyte. HClO. 3 C. 6 H 12 O. 6 HClO. strong. strong. non-weak. If 0.40 mol of each of the following are dissolved in. 2.5 L of water, rank them from least to greatest....
  • WELCOME! [vite.bellpensionersgroup.ca]

    WELCOME! [vite.bellpensionersgroup.ca]

    WELCOME! Ottawa Chapter ... AND Plan is being wound-up We call this risk "Plan Failure" There is no sign that Bell's plan faces failure but we don't assume that it cannot happen Pension Reform: Getting the Rules Right Legislation and...
  • Space News Update - March 31, 2017 In

    Space News Update - March 31, 2017 In

    NASA Observations Reshape Basic Plasma Wave Physics. In this computer graphic, NASA's Voyager 1 probe, moving toward upper left, nears the edge of the sun's influence, flying through a region of space dominated by a "magnetic highway" that helps mediate...
  • Network Policies and Procedures Presentation for DoE Office

    Network Policies and Procedures Presentation for DoE Office

    Network Policies and Procedures Presentation for DoE Office of Assurance Cybersecurity Review visit to SLAC August 2005
  • Titanium Research in the UK

    Titanium Research in the UK

    Titanium Research in the UK IOM3 Light Metals Division Committee March 2009 Martin Jackson UK Universities with R&D Interests in Titanium Birmingham Imperial Manchester Oxford Sheffield Swansea Warwick University of Birmingham Key people: Mike Loretto, David Hu & Xinhua Wu...
  • Looking forward to the new A levels

    Looking forward to the new A levels

    The continued encroachment of our own subject - e.g 'An Inconvenient Truth' used in English GCSE and Media Studies What of the future? Opportunities Is this the right time to for Exam Boards to present more of the same? ......
  • D&#x27;Accord 1 Leçon 3A.1

    D'Accord 1 Leçon 3A.1

    Descriptive adjectives (irregular adjectives, adjective placement-BAGS, and physical description.) Adjectives change spelling (and sound sometimes) depending on who or what you are describing. Observe how the spellings change for regular adjectives…
  • Chapter 7: Relational Database Design

    Chapter 7: Relational Database Design

    Chapter 7: Relational Database Design