Building Very Large Overlay Networks Jorg Liebeherr University

Building Very Large Overlay Networks Jorg Liebeherr University

Building Very Large Overlay Networks Jorg Liebeherr University of Virginia HyperCast Project HyperCast is a set of protocols for large-scale overlay multicasting and peer-to-peer networking Research problems: Investigate which topologies are best suited for a large number of peers that spuriously join and leave in a network that is dynamically changing Make it easier to write applications for overlays easier by providing suitable abstractions (overlay socket) shielding application programmer from overlay topology issues Acknowledgements Team: Past: Bhupinder Sethi, Tyler Beam, Burton Filstrup, Mike Nahas, Dongwen Wang, Konrad Lorincz, Jean Ablutz, Haiyong Wang, Weisheng Si,

Huafeng Lu Current: Jianping Wang, Guimin Zhang, Guangyu Dong, Josh Zaritsky This work is supported in part by the National Science Foundation: D E N A L I Applications with many receivers 1,000,000 Number of Senders Sensor networks Peer-to-Peer Applications 1,000 Games 10 1

Streaming Collaboration Tools 10 Software Distribution 1,000 1,000,000 Number of Receivers Need for Multicasting ? Maintaining unicast connections is not feasible Infrastructure or services needs to support a send to group Multicast support in the network infrastructure (IP Multicast) Reality Check: Deployment has encountered severe scalability limitations in both the

size and number of groups that can be supported IP Multicast is still plagued with concerns pertaining to scalability, network management, deployment and support for error, flow and congestion control Overlay Multicasting Logical overlay resides on top of the layer-3 network infrastructure (underlay) Data is forwarded between neighbors in the overlay Transparent to underlay network Overlay topology should match the layer-3 infrastructure Overlay-based approaches for multicasting Build an overlay mesh network and embed trees into the mesh: Narada (CMU) RMX/Gossamer (UCB) more Build a shared tree: Yallcast/Yoid (NTT, ACIRI) AMRoute (Telcordia, UMD College Park) Overcast (MIT) Nice (UMD) more Build an overlay using distributed hash tables and embed trees: Chord (UCB, MIT)

CAN (UCB, ACIRI) Pastry/Scribe (Rice, MSR) more HyperCast Overlay Topologies Applications organize themselves to form a logical overlay network with a given topology Data is forwarded along the edges of the overlay network Hypercube Delaunay triangulation Spanning tree (for mobile ad hoc) Programming in HyperCast: Overlay Socket

Application Program Socket-based API Overlay Socket Interface UDP (unicast or multicast) or TCP Supports different semantics for transport of data Supports different overlay protocols Overlay Socket

Application Receive Buffer Overlay Node Interface Forwarding Engine Overlay Node Adapter Interface Adapter Interface Node Adapter Socket Adapter Implementation in Java Messages of the Overlay Protocol Application Messages Underlay Network

HyperCast Project website: Message Store Network of overlay sockets Overlay socket is an endpoint of communication in an overlay network An overlay network is a collection of overlay sockets Overlay sockets in the same overlay network have a common set of attributes

Each overlay network has an overlay ID, which can be used the access attributes of overlay Nodes exchange data with neighbors in the overlay topology Application Application Overlay socket Overlay socket Internet Application Overlay socket Application Overlay socket

Application Application Unicast and Multicast in overlays Unicast and multicast is done along trees that are embedded in the overlay network. Requirement: Overlay node must be able to compute the child nodes and parent node with respect to a given root Multicast Root (sender) Unicast Root (receiver)

Sender Overlay Message Format Loosely modeled after IPv6 minimal header with extensions Common Header: 1 2 3 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 +-------+---------------+---+---+-------------------------------+ |Version|LAS|Dmd| Traffic Class | Flow Label | Next Header | +-------+---------------+---+---+-------------------------------+ | OL Message Length | Hop Limit | +-------------------------------+-------------------------------+ | Src LA |

+--------------------------------------------------------------| Dest LA | +---------------------------------------------------------------+ Version (4 bit): LAS (2 bit): Dmd (4bit) Traffic Class (8 bit): Flow Label (8 bit): Next Header (8 bit) OL Message Length (8 bit) Hop Limit (16 bit): Src LA ((LAS+1)*4 bytes) Dest LA ((LAS+1)*4 bytes Version of the protocol (current Version is 0x0) Size of logical address field Delivery Mode (Multicast, Flood, Unicast, Anycast) Specifies Quality of Service class (default: 0x00) Flow identifier Specifies the type of the next header following this header Specifies the type of the next header following this header. TTL field Logical address of the source Logical address of the destination

Socket Based API Tries to stay close to Socket API for UDP Multicast Program is independent of overlay topology //Generate //Generate the the configuration configuration object object OverlayManager om = new OverlayManager(hypercast.prop); OverlayManager om = new OverlayManager(hypercast.prop); String String overlayID overlayID == om.getDefaultProperty(MyOverlayID") om.getDefaultProperty(MyOverlayID") OverlaySocketConfig OverlaySocketConfig config config == new new

om.getOverlaySocketConfig(overlayID); om.getOverlaySocketConfig(overlayID); //create //create an an overlay overlay socket socket OL_Socket OL_Socket socket socket == config.createOverlaySocket(callback); config.createOverlaySocket(callback); //Join //Join an an overlay overlay socket.joinGroup(); socket.joinGroup(); //Create //Create aa message message OL_Message OL_Message msg msg == socket.createMessage(byte[] socket.createMessage(byte[] data, data, int int length);

length); //Send //Send the the message message to to all all members members in in overlay overlay network network socket.sendToAll(msg); socket.sendToAll(msg); //Receive //Receive aa message message from from the the socket socket OL_Message msg = socket.receive(); OL_Message msg = socket.receive(); //Extract //Extract the

the payload payload byte[] byte[] data data == msg.getPayload(); msg.getPayload(); Property File Stores attributes that configure the overlay socket (overlay protocol, transport protocol and addresses) # This is the Hypercast Configuration File # This is the Hypercast Configuration File # # # # # LOG FILE: # LOG FILE: LogFileName = stderr LogFileName = stderr # ERROR FILE: # ERROR FILE: ErrorFileName = stderr ErrorFileName = stderr # OVERLAY Server: # OVERLAY Server:

OverlayServer = OverlayServer = # OVERLAY ID: # OVERLAY ID: OverlayID = OverlayID = KeyAttributes = Socket,Node,SocketAdapter KeyAttributes = Socket,Node,SocketAdapter # SOCKET: # SOCKET: Socket = HCast2-0 Socket = HCast2-0 HCAST2-0.TTL = 255 HCAST2-0.TTL = 255 HCAST2-0.ReceiveBufferSize = 200 HCAST2-0.ReceiveBufferSize = 200 HCAST2-0.ReadTimeout = 0 HCAST2-0.ReadTimeout = 0 ... ... # SOCKET ADAPTER: # SOCKET ADAPTER: SocketAdapter = TCP SocketAdapter = TCP SocketAdapter.TCP.MaximumPacketLength = 16384

SocketAdapter.TCP.MaximumPacketLength = 16384 SocketAdapter.UDP.MessageBufferSize = 100 SocketAdapter.UDP.MessageBufferSize = 100 # NODE: # NODE: Node = HC2-0 Node = HC2-0 HC2-0.SleepTime = 400 HC2-0.SleepTime = 400 HC2-0.MaxAge = 5 HC2-0.MaxAge = 5 HC2-0.MaxMissingNeighbor = 10 HC2-0.MaxMissingNeighbor = 10 HC2-0.MaxSuppressJoinBeacon = 3 HC2-0.MaxSuppressJoinBeacon = 3 # NODE ADAPTER: # NODE ADAPTER: # # NodeAdapter = UDPMulticast NodeAdapter = UDPMulticast NodeAdapter.UDP.MaximumPacketLength = 8192 NodeAdapter.UDP.MaximumPacketLength = 8192 NodeAdapter.UDP.MessageBufferSize = 18 NodeAdapter.UDP.MessageBufferSize = 18 NodeAdapter.UDPServer.UdpServer0 =

NodeAdapter.UDPServer.UdpServer0 = NodeAdapter.UDPServer.MaxTransmissionTime = 1000 NodeAdapter.UDPServer.MaxTransmissionTime = 1000 NodeAdapter.UDPMulticastAddress = NodeAdapter.UDPMulticastAddress = Hypercast Software: Demo Applications Distributed Whiteboard Multicast file transfer Data aggregation in P2P Net: Homework assignment in CS757 ( Computer Networks) Application: Emergency Response Network for Arlington County, Virginia Project directed by B. Horowitz and S. Patek Application of P2P Technology to Advanced Emergency Response Systems

Dynamic Grouping and Information Management Situational Awareness Priority Messaging Other Features of Overlay Socket Current version (2.0): Statistics: Each part of the overlay socket maintains statistics which are accessed using a common interface Monitor and control: XML based protocol for remote access of statistics and remote control of experiments LoToS: Simulator for testing and visualization of overlay protocols Overlay Manager: Server for storing and downloading overlay configurations Next version (parts are done, release in Summer 2004): MessageStore: Enhanced data services, e.g., end-to-end reliability, persistence, streaming, etc. HyperCast for mobile ad-hoc networks on handheld devices Service differentiation for multi-stream delivery Clustering Integrity and privacy with distributed key management Delaunay Triangulation Overlays

Nodes in a Plane Nodes Nodesare areassigned assigned x-y x-ycoordinates coordinates 4,9 10,8 (e.g., (e.g.,based basedon on geographic geographiclocation) location) 0,6 0,3 0,2 5,2

0,1 0,0 1,0 2,0 3,0 12,0 Voronoi Regions 4,9 10,8 0,6 The TheVoronoi Voronoiregion regionof of aanode nodeisisthe theregion regionof of the theplane planethat thatisiscloser closer

to tothis thisnode nodethan thanto to any anyother othernode. node. 5,2 12,0 Delaunay Triangulation 4,9 10,8 0,6 The TheDelaunay Delaunay triangulation triangulationhas has edges

edgesbetween betweennodes nodes ininneighboring neighboringVoronoi Voronoi regions. regions. 5,2 12,0 Delaunay Triangulation 4,9 10,8 0,6 An Anequivalent equivalent definition: definition: AAtriangulation triangulationsuch such that

thateach each circumscribing circumscribingcircle circle of ofaatriangle triangleformed formedby by three threevertices, vertices,no no vertex vertexof ofisisininthe the interior interiorof ofthe thecircle. circle. 5,2 12,0

Locally Equiangular Property Sibson 1977: Maximize the minimum angle For every convex quadrilateral formed by triangles ACB and ABD that share a common edge AB, the minimum internal angle of triangles ACB and ABD is at least as large as the minimum internal angle of triangles ACD and CBD. C A D B Next-hop routing with Compass Routing A nodes parent in a spanning tree is its neighbor which forms the smallest angle with the root.

A node need only know information on its neighbors no routing protocol is needed for the overlay. A 30 Node Root 15 B is the Nodes Parent B Spanning Spanningtree treewhen whennode node (8,4) (8,4)isisroot. root. The Thetree treecan can be

becalculated calculatedby byboth both parents parentsand andchildren. children. 4,9 10,8 0,6 8,4 5,2 12,0 Evaluation of Delaunay Triangulation overlays Delaunay triangulation can consider location of nodes in an (x,y) plane, but is not aware of the network topology Question: How does Delaunay triangulation compare with other

overlay topologies? Overlay Topologies Delaunay Triangulation and variants DT Hierarchical DT Multipoint DT Hypercube Degree-6 Graph Similar to graphs generated in Narada Degree-3 Tree Similar to graphs generated in Yoid Logical MST Minimum Spanning Tree overlays used by HyperCast overlays that assume knowledge of network topology Transit-Stub Network Transit-Stub

GeorgiaTech topology generator 4 transit domains 416 stub domains 1024 total routers 128 hosts on stub domain Evaluation of Overlays Simulation: Network with 1024 routers (Transit-Stub topology) 2 - 512 hosts Performance measures for trees embedded in an overlay network: Degree of a node in an embedded tree Stretch: Ratio of delay in overlay to shortest path delay Stress: Number of duplicate transmissions over a physical link

Illustration of Stress and Stretch 1 AA 1 1 1 1 Stress = 2 1 1 1 1 BB 1

Stress = 2 Unicast delay AB : Delay AB in overlay: Stretch for AB: 4 6 1.5 Stretch (90th Percentile) Average Stretch Stretch (90th Percentile) 90th Percentile of Stretch Delaunay triangulation Average Stress Delaunay triangulation 90th Percentile of Stress Delaunay triangulation

The DT Protocol Protocol which organizes members of a network in a Delaunay Triangulation Each member only knows its neighbors soft-state protocol Topics: Nodes and Neighbors Example: A node joins State Diagram Rendezvous Measurement Experiments Each Eachnode nodesends sendsHello Hello messages messagesto toits itsneighbors neighbors periodically

periodically 4,9 Hello Hello Hello Hello llo He Hello 10,8 Hello He l l o

5,2 He l l o He llo Hello 0,6 He llo Hello o l l He 12,0

Each EachHello Hellocontains containsthe theclockwise clockwise(CW) (CW)and andcounterclockwise counterclockwise(CCW) (CCW) neighbors neighbors Receiver Receiverof ofaaHello Helloruns runsaaNeighbor Neighbortest test( (locally locallyequiangular equiangularprop.) prop.) CW CWand andCCW

CCWare areused usedto todetect detectnew newneighbors neighbors 4,9 5,2 12,0 CCW 5,2 4,9 12,0 CW CW He CC = llo W 12 = ,0 4,

9 0,6 Neighborhood Table of 10,8 Neighbor 10,8 12,0 5,2 4,9 10,8 AAnode nodethat thatwants wantsto tojoin jointhe the triangulation triangulationcontacts

contactsaanode node that thatisisclose close 4,9 10,8 0,6 New node 8,4 o l l e H 5,2 12,0 Node Node(5,2) (5,2)updates updatesits itsVoronoi Voronoi

region, region,and andthe thetriangulation triangulation 4,9 10,8 0,6 8,4 5,2 12,0 (5,2) (5,2)sends sendsaaHello Hellowhich whichcontains contains info infofor forcontacting contactingits itsclockwise clockwiseand and

counterclockwise counterclockwiseneighbors neighbors 4,9 10,8 0,6 llo e H 8,4 5,2 12,0 (8,4) (8,4)contacts contactsthese theseneighbors neighbors... ... 4,9 10,8

llo He 0,6 8,4 lo el H 5,2 12,0 which whichupdate updatetheir theirrespective respective Voronoi Voronoiregions. regions. 4,9 10,8

0,6 8,4 5,2 12,0 (4,9) (4,9)and and(12,0) (12,0)send sendHellos Hellosand and provide provideinfo infofor forcontacting contactingtheir their respective respectiveclockwise clockwiseand and counterclockwise counterclockwiseneighbors. neighbors.

4,9 10,8 llo He 0,6 8,4 5,2 H el lo 12,0 (8,4) (8,4)contacts contactsthe thenew newneighbor neighbor (10,8) (10,8)... ...

4,9 He llo 10,8 0,6 8,4 5,2 12,0 which whichupdates updatesits itsVoronoi Voronoiregion... region... 4,9 10,8 0,6 8,4 5,2 12,0

and andresponds respondswith withaaHello Hello 4,9 He llo 10,8 0,6 8,4 5,2 12,0 This Thiscompletes completesthe theupdate updateof ofthe the Voronoi Voronoiregions

regionsand andthe theDelaunay Delaunay Triangulation Triangulation 4,9 10,8 0,6 8,4 5,2 12,0 Rendezvous Methods Rendezvous Problems: How does a new node detect a member of the overlay? How does the overlay repair a partition? Three solutions: 1. Announcement via broadcast 2. Use of a rendezvous server

3. Use `likely members (Buddy List) Rendezvous RendezvousMethod Method1: 1: Announcement Announcementvia viabroadcast broadcast (e.g., (e.g., using usingIP IPMulticast) Multicast) 4,9 10,8 0,6 New node llo e H 8,4

5,2 12,0 Rendezvous Rendezvous Method Method1: 1: Leader 4,9 10,8 AALeader Leaderisisaanode node with withaaY-coordinate Y-coordinate higher higherthan thanany anyof ofits

its neighbors. neighbors. 0,6 5,2 12,0 Rendezvous RendezvousMethod Method2: 2: New Newnode nodeand andleader leadercontact contactaa rendezvous rendezvousserver. server.Server Serverkeeps keepsaacache cacheof ofsome someother

othernodes nodes 4,9 10,8 Serv Req er uest 0,6 Server Reply (12,0) Server llo e H 5,2 New node 8,4 N ew

N ew N N od od e e 12,0 Rendezvous RendezvousMethod Method3: 3:Each Eachnode nodehas hasaalist listof oflikely likelymembers members of

ofthe theoverlay overlaynetwork network 4,9 10,8 0,6 llo e H 5,2 8,4 N ew N ew N N od New node with Buddy

List: (12,0) (4,9) od e e 12,0 State Diagram of a Node Stopped Leader without Neighbor Application starts Neighbor added (with larger coordinates) Neighbor added (with smaller coordinates) All neighbors leave or timeout

All neighbors leave or timeout A new neighbor with greater coordinates is added Leader with Neighbor Not Leader After removing some neighbor, this node has largest coordinates Application exits Send Goodbye Send Goodbye Leaving Send Goodbye Sub-states of a Node Stable Without

Candidate Neighbor After neighborhood updating, node becomes not stable Node contained in NewNode passes neighbor test After handling the candidate neighbor, node remains stable Stable With Candidate Neighbor After neighborhood updating, node becomes stable. After neighborhood updating, node becomes not stable Not Stable

AAnode nodeisisstable stablewhen whenall allnodes nodesthat thatappear appearininthe theCW CWand and CCW CCWneighbor neighborcolumns columnsof ofthe theneighborhood neighborhoodtable tablealso alsoappear appear ininthe theneighbor neighborcolumn column

Measurement Experiments Experimental Platform: Centurion cluster at UVA (cluster of 300 Linux PCs) 2 to 10,000 overlay members 1100 members per PC Random coordinate assignments Gigabit Ethernet Internet Switch 3 centurion183 centurion253-255 centurion246 centurion250 centurion251 centurion249 centurion252 centurion149-167 Switch 4 Switch 5 Switch 8 Switch 9

centurion168-182 centurion164-187 centurion188-211 Switch 6 Switch 11 Switch 7 Switch 10 centurion128-147 centurion228-247 Experiment: Adding Members Time to Complete (sec) How long does it take to add M members to an overlay network of N members ? M+N members Experiment: Throughput of Multicasting

Average throughput (Mbps) 100 MB bulk transfer for N=2-100 members (1 node per PC) 10 MB bulk transfer for N=20-1000 members (10 nodes per PC) Bandwidth bounds (due to stress) Measured values Number of Members N Experiment: Delay Delay of a packet (msec) 100 MB bulk transfer for N=2-100 members (1 node per PC) 10 MB bulk transfer for N=20-1000 members (10 nodes per PC) Number of Nodes N Wide-area experiments PlanetLab: worldwide network of Linux PCs

Obtain coordinates via triangulation Delay measurements to 3 PCs Global Network Positioning (Eugene Ng, CMU) Planetlab Video Streaming Demo Summary Overlay socket is general API for programming P2P systems Several proof-of-concept applications Intensive experimental testing: Local PC cluster (on 100 PCs) 10,000 node overlay in < 1 minute Throughput of 2 Mbps for 1,000 receivers PlanetLab (on 60 machines) Up to 2,000 nodes Video streaming experiment with 80 receivers at 800 kbps had few losses HyperCast (Version 2.0) site: Design documents, download software, user manual

Recently Viewed Presentations

  • Geography and Early India The Big Idea Indian

    Geography and Early India The Big Idea Indian

    Summary Geography of Indian Subcontinent Harappan Civilization Aryan Civilization India is a very large and separated from the continent of Asia by mountains Subcontinent =smaller than a continent Ganges delta in east Triangular shaped Bay of Bengal to the east...
  • House of the Scorpion Chapter Review Questions

    House of the Scorpion Chapter Review Questions

    What did Celia show Tam Lin? What does Tam Lin mean when saying he can't escape the "moral consequences?" What does he intend to do? What is the "filthy lie" about clones? What is meant by "El Patron kept Opium...
  • Part 2 CHM1C3 Organic Acids and Bases Acid

    Part 2 CHM1C3 Organic Acids and Bases Acid

    Part 2 CHM1C3 Organic Acids and Bases Content of Part 2 Definition of Bronsted acids and bases Definition of conjugate acids and bases Ka pKa Typical pKa values Eplaining differences in acidity: Resonance Effects Eplaining differences in acidity: Inductive Effects...
  • Wide Area Networks (WANs)

    Wide Area Networks (WANs)

    Continued…The Threat Environment:Attackers and Their Attacks. Reference: Corporate & Network Security, Chapter 1. Raymond R. Panko:
  • The Protestant Reformation Through Maps

    The Protestant Reformation Through Maps

    The Protestant Reformation Through Maps Susan M. Pojer Horace Greeley HS Chappaqua, NY R. H. Bainton The Reformation of the 16c Thus, the papacy emerged as something between an Italian city-state and European power, without forgetting at the same time...
  • Chapter 18 Cancer Genetics and Genomics

    Chapter 18 Cancer Genetics and Genomics

    Challenges of Diagnosing and Treating Cancer (2 of 3) Ability to sequence cancer exomes and genomes is providing a new specificity for diagnosing and treating cancer. Cell-free tumor DNA (ctDNA) can reveal response to a treatment. Liquid biopsy—A less invasive...
  • Spinal cord circuit and motor control  Overview of

    Spinal cord circuit and motor control Overview of

    Motor neuron pool In a mature animal, each muscle fiber receives input from a single motor neuron Each motor neuron innervates multiple muscle fibers All motor neurons innervating a muscle is called a motor neuron pool All motor neurons in...
  • Ron Taverner, OPP Commissioner Date Appointed: November 29,

    Ron Taverner, OPP Commissioner Date Appointed: November 29,

    Ron Taverner, OPP Commissioner. Date Appointed: November 29, 2018. Taverner is a long-time friend of Premier Doug Ford. He did not qualify for the $275,000/yr. position until the government lowered the required qualifications.