Fast Down Detection

Posted by

Fast down detection or sub-second link failure detection is needed in a modern networks. Network operators of modern networks require that they can detect failure in sub-second and react to either soft or hard failures as quickly as possible.

The following are the two categories of fast down detection.

1. Polling

One of the method that polling uses is routing protocol hellos.

By default, EIGRP sends hello packets every 5 seconds on high bandwidth link and every 60 seconds on low bandwidth multipoint links.
The speed at which the EIGRP sends hello packets is called hello interval.
The hello interval is configurable using the command ip hello-interval eigrp.

The hold time is three times the hello interval. The hold time is the duration that a router will consider a neighbour is up without receiving a hello packet.
Hold time is configurable using the command ip hold-interval eigrp.

EIGRP neighbours can establish adjacency even their hello interval and hold time are different.

EIGRP does not support fast hellos or sub-second hello unlike OSPF and IS-IS.

For 5-second hello:
broadcast media, such as Ethernet, Token Ring, and FDDI
point-to-point serial links, such as PPP or HDLC leased circuits, Frame Relay point-to-point subinterfaces, and ATM point-to-point subinterface
high bandwidth (greater than T1) multipoint circuits, such as ISDN PRI and Frame Relay

For 60-second hello:
multipoint circuits T1 bandwidth or slower, such as Frame Relay multipoint interfaces, ATM multipoint interfaces, ATM switched virtual circuits and ISDN BRIs

OSPF sends hello packet every 10 seconds for broadcast media (e.g. ethernet) and every 30 seconds for a non-broadcast media (e.g.frame relay).
The dead interval (similar to hold time in EIGRP) is four times the value of the hello interval.
OSPF neighbours should have the same hello and dead interval otherwise, adjacency does not come up.

Unlike EIGRP, fast hello or sub-second hello is supported in OSPF. The benefit of fast hello is for fast down detection of neighbour particularly beneficial in broadcast media (.e.g ethernet).
Fast hello is configurable using this command ip ospf dead-interval minimal hello-multiplier multiplier.

IS-IS sends hello packet every 10 seconds. The hello interval can set differently for Level 1 and 2 except on point-to-point interfaces.
Hello interval is configurable using this command isis hello-interval {seconds} [level-1 | level-2].
IS-IS supports fast hello for faster convergence same with OSPF. To configure the fast hello, use this command isis hello-multiplier multiplier [level-1 | level-2].

BGP uses keepalive for fast down detection. By default, BGP sends keepalive every 60 seconds with a hold time of three times the keepalive which is 180 seconds.
These parameters are configurable using this command neighbor [ip-address | peer-group-name] timers keepalive holdtime [min-holdtime]

Fine tuning hello timer, hold down timer and keepalive makes the link failure detection faster. In return, faster network convergence.
However, tuning timers/keepalive must be carefully examined in big networks as this is a CPU intensive.

Consider a network of ten point-to-point links with ten neighbours and using OSPF as its routing protocol. Hello and hold down timers are on its default.

1 OSPF hello per second x 10 point-to-point link = 10 hello packets per second

What happened if we enable OSPF fast hello. In this example, let us assume that hello is sent every 330ms. Thus, three hellos in one second.

3 OSPF hello per second x 10 point-to-point link = 30 hello packets per second

The numbers shown above is not that big but how about if you had more than 50 neighbours? or 75? or 100?

Second method is protocol’s built-in hellos for fast down detection.

Unidirectional Link Detection protocol also known as UDLD for short is a proprietary protocol developed by Cisco to determine the physical status of the link. UDLD is good in detecting these scenarios:

1. Links are up on both sides, however, packets are only received by either side.
2. Miswire when receive and transmit fibers are not connected to the same port on the remote side.

Fast UDLD is the latest enhancement of UDLD. Fast UDLD is created for sub-second fast down detection.

To enable UDLD and Fast UDLD, all switches must support these two protocols.

Spanning Tree Protocol Bridge Assurance (STP BA) is use to protect against problems that can cause bridging loops in the network, specifically, unidirectional link failure (wiring mistake) or other software failure like when it continues to forward data traffic when it is no longer running the spanning tree algorithm.

To enable STP BA, all switches must support this protocol.

Etherchannel is a port aggregation technology to bundle two or more physical ports to form into one logical port. This is use to increase bandwidth, load balance traffic and link redundancy. PAgP and LACP are the two etherchannel protocols. Port Aggregation Protocol (PAgP) is Cisco proprietary and Link Aggregation Control Protocol (LACP) is open standard.
These protocols have built-in timers and if one of the physical links is down for whatever reason, this port is automatically place taken out of the etherchannel bundle.

2. Event driven

Link OAM is composes of different tools that helps network operator monitor and troubleshoot link problems particularly in Ethernet-based networks.

Connectivity Fault Management (also known as 802.1ag-2007) provides the capabilities to detect, verify, isolate and report end-to-end Ethernet connectivity issues.

Other option for fast down detection is relaying on lower-layer device like for example a Frame Relay (FR) switch in a FR network. Another example is add/drop multiplexer (ADM ) in SONET network.

Leave a Reply

Your email address will not be published. Required fields are marked *