Virtual Private Machines: De-stressing systems

Virtual Private Machines: De-stressing systems

IP Network Management Richard Mortier Digital Communications II, CUCL 06/02/20 Overview Introduction Abstractions IP network components IP network management protocols Pulling it all together An alternative approach Digital Communications II, CUCL Overview Introduction Whats it all about then? Abstractions

IP network components IP network management protocols Pulling it all together An alternative approach Digital Communications II, CUCL What is network management? One point-of-view: a large field full of acronyms EMS, TMN, NE, CMIP, CMISE, OSS, AN.1, TL1, EML, FCAPS, ITU, ... (Dont ask me what all of those mean, I dont care!) From question.com: In 1989, a random of the journalistic persuasion asked hacker Paul Boutin What do you think will be the biggest problem in computing in the 90s? Pauls straight-faced response: There are only 17,000 threeletter acronyms. We will ignore most of them

Digital Communications II, CUCL What is network management? Computer networks are considered to have three operating timescales Data: packet forwarding [s, ms]s, ms]s, ms] Control: flows/connections [s, ms] secs, mins ] Management: aggregates, networks [s, ms] hours, days ] so were concerned with the network rather than particular devices or protocols Standardization is key! Digital Communications II, CUCL Overview Introduction Abstractions ISO FCAPS, TMN EMS, ATM

IP network components IP network management protocols Pulling it all together An alternative approach Digital Communications II, CUCL ISO FCAPS: functional separation Fault Recognize, isolate, correct, log faults Configuration Collect, store, track configurations Accounting Collect statistics, bill users, enforce quotas Performance Monitor trends, set thresholds, trigger alarms Security

Identify, secure, manage risks Digital Communications II, CUCL TMN EMS: administrative separation Telecommunications Management Network Element Management System ...simple but elegant... (!) (my emphasis) (often the two go together)

NEL: network elements (switches, transmission systems) EML: element management (devices, links) NML: network management (capacity, congestion) SML: service management (SLAs, time-to-market) BML: business management (RoI, market share, blah) Digital Communications II, CUCL The B-ISDN reference model Asynchronous Transfer Mode cube See IAP lectures, maybe Plane management The whole network

Topology Configuration Fault Operations Accounting Performance higher layers user plane higher layers ATM adaptation layer ATM layer physical layer Digital Communications II, CUCL

layer management Specific layers control plane plane management vs layer management management plane Network management Models of general communication networks Tend to be quite abstract and exceedingly tedious! Many practitioners still seem excited about OO programming, WIMP interfaces, etc probably because implementation is hard due to so many excessively long and complex standards! My view: basic need-to-know requirements are

1.What should be happening? [s, ms] c ] 2.What is happening? [s, ms] f, p, a ] 3.What shouldnt be happening? [s, ms] f, s ] 4.What will be happening? [s, ms] p, a ] Digital Communications II, CUCL Network management Well concentrate on IP networks Still acronym city: ICMP, SNMP, MIB, RFC Sample size: 102 routers, 105 hosts Well concentrate on the network core Routers, not hosts Well ignore service management DNS, AD, file stores, etc Digital Communications II, CUCL Overview Introduction

Abstractions IP network components IP, networks, routers IP network management protocols Pulling it all together An alternative approach Digital Communications II, CUCL IP primer (you probably know all this) Destination-routed packets no connections Time-to-live field: allow removal of looping packets Routers forward packets based on routeing tables Tables populated by routeing protocols Routers and protocols operate independently although protocols aim to build consistent state RFCs ~= standards

Often much looser semantics than e.g. ISO, ITU standards Compare for example OSPF [s, ms]RFC2327] and IS-IS [s, ms]RFC1142, RFC1195], two link-state routeing protocols Digital Communications II, CUCL So, how do you build an IP network? 1. Buy (lease) routers $1m? $2m? for a new, populated, backbone router! 2. Buy (lease) fibre 3. Connect them all together Wayleaves = $$$ Be a landowner! Correctly. For now.

4. Configure routers Mwuhahaha. 5. Configure end-systems Someone elses can of worms. Digital Communications II, CUCL Multiple router flavours A sample taxonomy Core OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps) Big, fat, fast, expensive E.g. Cisco HFR, Juniper T-640

HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k Transit/Peering-facing OC-3 and up, good GigE density ACLs, full-on BGP, uRPF, accounting Customer-facing Cisco CRS-1 Multi-shelf system FR/ATM/ Feature set as above, plus fancy queues, etc Broadband aggregator

High scalability: sessions, ports, reconnections Feature set as above Customer-premises (CPE) 100Mbps, maybe NAT, DHCP, firewall, wireless, VoIP, Low cost, low-end, perhaps just software on a PC Digital Communications II, CUCL Network design Whose network?

ISPs, IXs, enterprise, campus POPs, DCs Many designs: flat, hierarchical, hybrids, multiple scales Many constraints Business Backwards compatibility. Who to connect. Peering. Technology Power directly (24x7 operation) and indirectly (cooling)

Port density vs. raw bandwidth Software reliability Hardware/software capability Addressing schemes for scalability, summarization Cant run feature X with feature Y on vendor C in network size N Connectivity/resiliency All core routers connect to at least 2 other core routers All edge routers connect to at least 2 core routers Digital Communications II, CUCL Router configuration Initialization

Name the router, setup boot options, setup authentication options Configure interfaces Loopback, ethernet, fibre, ATM Subnet/mask, filters, static routes Shutdown (or not), queueing options, full/half duplex Configure routeing protocols (OSPF, BGP, IS-IS, ) Process number, addresses to accept routes from, networks to advertise Access lists, filters, ... Numeric id, permit/deny, subnet/mask, protocol, port Route-maps, matching routes rather than data traffic Other configuration aspects: traps, syslog, etc (Oh, and switch configuration is about as painful) Digital Communications II, CUCL Router configuration fragments hostname FOOBAR

! boot system flash slot0:a-boot-image.bin boot system flash bootflash: logging buffered interface 100000 debugging Loopback0 logging console informational description router-1.network.corp.com aaa new-model ip address 10.65.21.43 255.255.255.255 aaa authentication ! login default tacacs local aaa authentication login consoleport none interface FastEthernet0/0/0 router ospf aaa authenticationdescription ppp default

if-needed Link to 2Newtacacs York log-adjacency-changes aaa authorization ip network tacacs ! address 10.65.43.21 255.255.255.128 passive-interface FastEthernet0/0/0 ip tftp source-interface Loopback0 ip access-group 175 in passive-interface FastEthernet0/1/0 no ip domain-lookup

ip helper-address 10.65.12.34 passive-interface FastEthernet1/0/0 ip name-server 10.34.56.78 ip pim sparse-mode ! ip cgmp passive-interface FastEthernet1/1/0 FastEthernet2/0/0 ip multicast-routing ip dvmrp passive-interface accept-filter 98 neighbor-list 99 ip dvmrp route-limit 7000 passive-interface FastEthernet2/1/0 full-duplex passive-interface FastEthernet3/0/0 ip cef distributed ! access-list 24 remark Mcastnetwork ACL

10.65.23.45 0.0.0.255 area 1.0.0.0 interface FastEthernet4/0/0 access-list 24 permit 239.255.255.254 network 10.65.34.56 0.0.0.255 area 1.0.0.0 no ip address access-list 24 permit 224.0.1.111 network 183 10.65.43.0 0.0.0.127 area 1.0.0.0 ip access-group in access-list 24 permit 239.192.0.0 0.3.255.255 ip pim sparse-mode access-list 24 permit

232.192.0.0 0.3.255.255 ip cgmp access-list 24 permit 224.0.0.0 0.0.0.255 shutdown tftp-server 1011 slot1:some-other-image.bin access-list deny 0000.0000.0000 ffff.ffff.ffff ffff.ffff.ffff 0000.0000.0000 0xD1 2 eq 0x42 full-duplex tacacs-server host 10.65.0.2 access-list 1011 permit 0000.0000.0000 ffff.ffff.ffff 0000.0000.0000 ffff.ffff.ffff tacacs-server key xxxxxxxx rmon event 1 trap Trap1 description "CPU Utilization>75%" owner config rmon event 2 trap Trap2 description "CPU Utilization>95%" owner config

Digital Communications II, CUCL Router configuration Lots of large, fragile text files 00s/000s routers, 00s/000s lines per config Errors are hard to find and have non-obvious results Router configuration also editable on-line Order matters! How to keep track of them all? Naming schemes, directory trees, CVS, ssh upload and atomic commit to router Perhaps even a proper database This counts as advanced! State of the art is pretty basic Few tools to check consistency, design goals Generally generate configurations from templates and have human-intensive process to control access to running configs

Topic of current research [s, ms]Feamster et al] Digital Communications II, CUCL Overview Introduction Abstractions IP network components IP network management protocols ICMP, SNMP, NetFlow Pulling it all together An alternative approach Digital Communications II, CUCL ICMP Internet Control Message Protocol [s, ms]RFC792] IP protocol #1 In-band control Variety of message types

echo/echo reply [s, ms] PING (packet internet groper) ] time exceeded [s, ms] TRACEROUTE ] destination unreachable, redirect source quench Digital Communications II, CUCL Ping (Packet INternet Groper) Test for liveness also used to measure (round-trip) latency Send ICMP echo Valid IP host [s, ms]RFC1122, RFC1123] must reply with ICMP echo response Subnet PING? Useful but often not available/deprecated ACK implosion could be a problem RFCs ~= standards Digital Communications II, CUCL

Traceroute Which route do my packets take to their destination? Send UDP packets with increasing time-to-live values Compliant IP host must respond with ICMP time exceeded Triggers each host along path to so respond Not quite that simple One router, many IP addresses: which source address? Router control processor, inbound or outbound interface Asymmetric routes (return path != outbound path) Routes change

Do we want full-mesh host-host routes anyway?! Size of data set, amount of probe traffic This is topology, what about load on links? Digital Communications II, CUCL SNMP Protocol to manage information tables at devices Provides get, set, trap, notify operations get, set: read, write values trap: signal a condition (e.g. threshold exceeded) notify: reliable trap Complexity mostly in the MIB design

Some standard tables, but many vendor specific Non-critical, so often tables populated incorrectly Many tens of MIBs (thousands of lines) per device Different versions, different data, different semantics Yet another configuration tracking problem Inter-relationships between MIBs Digital Communications II, CUCL IPFIX IETF working group Export of flow based data out of IP network devices Developing suitable protocol from Cisco NetFlow v9 [s, ms]RFC3954, RFC3955] Statistics reporting Setup template Send data records matching template Many variables

Packet/flow counters, rule matches, quite flexible Digital Communications II, CUCL Overview Introduction Abstractions IP network components IP network management protocols Pulling it all together Network mapping, statistics gathering, control An alternative approach Digital Communications II, CUCL An hypothetical NMS GUI around ICMP (ping, traceroute), SNMP, etc

Recursive host discovery Broadcast ping, ARP, default gateway: start somewhere Recursively SNMP query for known hosts/connected networks Ping known hosts to test liveness Iterate Display topology: allow drill-down to particular devices Configure and monitor known devices Trap, Netflow, syslog message destinations Counter thresholds, CPU utilization threshold, fault reporting Particular faults or fault patterns Interface statistics and graphs (MRTG) Digital Communications II, CUCL

NOC, NOC. Calling AT&T Digital Communications II, CUCL What are they all looking at? Digital Communications II, CUCL http://www.stat.ee.ethz.ch/mrtg/ An hypothetical NMS All very straightforward? No, not really A lot of software engineering: corner cases, traceroute interpretation, NATs, etc Correctness

MIBs may contain rubbish Can only view inside your network anyway Tunnelled, encrypted protocols becoming prevalent Efficiency Rate pacing discovery traffic: ping implosion/explosion SNMP overloading router CPUs Using NMSs also not straightforward How to setup correct thresholds? How to decide when something bad has happened? How to present (or even interpret) reams and reams of data?

Digital Communications II, CUCL Overview Introduction Abstractions IP network components IP network management protocols Pulling it all together An alternative approach From the edges Digital Communications II, CUCL Anemone An endsystem network management platform Collect flow information from endsystems, and Combine with topology information from routeing protocols Endsystems have more information about their

traffic network devices No router support required A platform to support many applications Currently concentrating on managed networks E.g. governments, enterprises, etc High complexity, high value High degree of endsystem control Digital Communications II, CUCL Applications Real-time and historical analysis Current topology + ingress, egress flows gives global picture of network behaviour Capacity planning, anomaly detection Modelling what if scenarios

Plug into a simulator back-end What happens to the network if we move all our Exchange servers to a single data centre? Automatic configuration Close the loop: enable network to meet dynamic SLAs Reconfigure network to track e.g. time of day load changes Digital Communications II, CUCL Challenges 1. How do you instrument an entire 1% coverage gives network? Do you need to 99.999% bytes and flows instrument all endsystems? 2. WhatFlow network informationwith should be data augmented

captured and stored? application and user context 3. How do you access data stores distributed across query a large network Use distributed system (ca. 300,000 nodes)? Digital Communications II, CUCL Summary Introduction What is network management? Abstractions ISO FCAPS, TMN EMS, ATM

IP network components IP, networks, routers IP network management protocols ICMP, SNMP, etc Pulling it all together Outline of a network management system An alternative approach: from the edges Digital Communications II, CUCL The end Questions Answers? http://www.cisco.com/ http://www.routergod.com/ http://www.ietf.org/ http://ipmon.sprintlabs.com/pyrt/ http://www.nanog.org/ Digital Communications II, CUCL

Backup slides Internet routeing OSPF BGP Digital Communications II, CUCL Internet routeing Q: how to get a packet from node to destination? A1: advertise all reachable destinations and apply a consistent cost function (distance vector) A2: learn network topology and compute consistent shortest paths (link state) Each node (1) discovers and advertises adjacencies; (2) builds link state database; (3) computes shortest paths A1, A2: Forward to next-hop using longest-prefixmatch Digital Communications II, CUCL

OSPF (~link state routeing) Q: how to route given packet from any node to destination? A: learn network topology; compute shortest paths For each node Discover adjacencies (~immediate neighbours); advertise Build link state database (~network topology) Compute shortest paths to all destination prefixes Forward to next-hop using longest-prefix-match (~most specific route) Digital Communications II, CUCL BGP (~path vector routeing) Q: how to route given packet from any node to destination? A: neighbours tell you destinations they can reach; pick cheapest option For each node Receive (destination, cost, next-hop) for all destinations known to neighbour

Longest-prefix-match among next-hops for given destination Advertise selected (destination, cost+, next-hop') for all known destinations Selection process is complicated Routes can be modified/hidden at all three stages General mechanism for application of policy Digital Communications II, CUCL Digital Communications II, CUCL Anemone: where are we now? Three major components Flow collection Route collection Distributed database Building prototypes, simulating system

Digital Communications II, CUCL Data collection Flow collection Hosts track active flows Using low overhead event posting infrastructure, ETW Built prototype device driver provider & user-space consumer Used packet traces for feasibility study on (client, server) Peaks at (165, 5667) live and (39, 567) active flows per sec Route collection OSPF is link-state: passively collect link state adverts Extension of my work at Sprint (for IS-IS and BGP); also been done at AT&T (NSDI04 paper) Digital Communications II, CUCL The distributed database

Logically contains 1. Traffic flow matrix (bandwidths), {srcs} {dsts} 2. each entry annotated with current route from src to dst N.B. src/dst might be e.g. (IP end-point, application) Large dynamic data set suggests aggregation Related work { distributed, continuous query, temporal } databases Sensor networks Potential starting points: Astrolabe or SDIMS (SIGCOMM04) Where/what/how much to aggregate? Is data read- or write-dominated? Which is more dynamic, flow or topology data? Can the system successfully self-tune?

Digital Communications II, CUCL The distributed database Construct traffic matrix from flow monitoring Hosts can supply flows they source and sink Only need a subset of this data to get complete traffic matrix Construct topology from route collection OSPF supplies topology routes Wish to be able to answer queries like Who are the top-10 traffic generators? Easy to aggregate, dont care about topology What is the load on link l ? Can aggregate from hosts, but need to know routes

What happens if we remove links {lm} ? Interaction between traffic matrix, topology, even flow control Digital Communications II, CUCL The distributed database Building simulation model OSPF data gives topology, event list, routes Simple load model to start with (load ~ # subnets) Precedence matrix (from SPF) reduces flow-data query set Can we do as well/better than e.g. NetFlow? Accuracy/coverage trade-off How should we distribute the DB? Just OSPF data? Just flow data? A mixture? How many levels of aggregation? How many nodes do queries touch? What sort of API is suitable? Example queries for sample applications

Digital Communications II, CUCL

Recently Viewed Presentations

  • Perception and Negotiation Chapter 5

    Perception and Negotiation Chapter 5

    For example Smiling Frowning Projection Occurs when people ascribe to others the characteristics or feelings that they possess themselves. For example Frustration Delays Questions Perception and Negotiation Chapter 5 By Ciandra Ross The Role of Perception Negotiators approach each negotiation...
  • Chapter 27, Worms and Mollusks (continued)

    Chapter 27, Worms and Mollusks (continued)

    Chapter 27, Worms and Mollusks (continued) ... Chapter 27, Worms and Mollusks Annelids What is a clitellum, and what is its function? A clitellum is a band of thickened, specialized, segments that secrete a mucous ring into which eggs and...
  • Patent Pools - University of Wisconsin-Madison

    Patent Pools - University of Wisconsin-Madison

    We've now considered three variations of a negligence rule: Simple negligence Negligence with a defense of contributory negligence Comparative negligence One other variation to consider is a rule of strict liability with a defense of contributory negligence That is, regardless...
  • Sta220 - Statistics Mr. Smith Room 310 Class

    Sta220 - Statistics Mr. Smith Room 310 Class

    Example 4.7. Suppose you work for an insurance company and you sell a $10,000 one-year term insurance policy at an annual premium of $290. Actuary tables show that the probability of death during the next year for a person of...
  • 2009 Maldives Demographic and Health Survey The 2009

    2009 Maldives Demographic and Health Survey The 2009

    The MDHS sample is a nationally representative sample. It was designed to provide estimates for the country as a whole, for urban and rural areas, for the six geographical regions, and for key indicators for each of the atolls of...
  • Accident Causation - Army Education Benefits Blog

    Accident Causation - Army Education Benefits Blog

    ACCIDENT CAUSATION Factory managers reasoned that workers were hurt because — Heinrich's Theorems INJURY - caused by accidents. ACCIDENTS - caused by an unsafe act - injured person or an unsafe condition - work place.
  • Systems Analysis and Design 11th Edition Chapter 2

    Systems Analysis and Design 11th Edition Chapter 2

    The Visible Analyst CASE tool supports strategic planning and allows a user to enter many kinds of planning statements. Notice the four SWOT categories highlighted in the list. Screenshots used with permission from Visible Systems Corporation.
  • Module 2 - law.msu.edu

    Module 2 - law.msu.edu

    "NAD noted that the depiction of the other juice products crashing to the floor is fanciful, i.e., the camera clumsily takes the viewer from the shelved juices to the advertiser's product which can be found in the refrigerated case.