Redundancy for small sites.


We are working with alot of customers having lots of small “sites”, meaning that each site range from having 1 to 20 devices. The devices can be a user workstation or it can be some sort of automatic equipment having a VPN tunnel back to the main headquarter.

As the importance of each site grows, we are seeing these customers asking for a fail-over method for these sites and with the rise of cheap 3G connections it only seems natural to go down that route.

Cisco makes some small branch routers with 3G as part of the features. Routers such as the Cisco 819 suits the type of customers asking for this type of solution.

With these routers you basically get a normal WAN interface consisting of a Gigabit or FastEthernet interface and a Cellular interface which you have to tie together with a dialer interface.

To create the automatic failover, i have a small topology consisting of 4 routers:

Topology

The CE router is the site-router (think 819’ish router), but instead of a Cellular interface, i have created a Serial interface. (I dont have the possibility for 3G in my home lab).

ISP1 and ISP2 are two different ISP’s, but could also be the same ISP, providing services through different delivery methods.

What we want to accomplish is the ability to choose a different default route in case our primary provider fails. This failure can be both a link failure, but also a routing failure. How “far out” you want to meassure this routing failure is up to you, but choose a sensible and stable point.

In our example, the 43.43.43.43/32 is our “stable” point.

This is how we are going to accomplish the task:

First off, we need to use IP SLA to provide us with an up/down situation toward the stable point. We will just use regular icmp-echo, sent at a 5 second interval.

We then create a tracking object which we will use with our static routing commands. The tracking object references the IP SLA.

But in order for this to work, we must make sure that we always use our primary interface/path for sending these echo’s. If not, we would have continous flapping. When primary goes down, it switches to the secondary, but since we might be able to reach the stable point through our second path, it will reinstall the primary default route and on and on.

To pull this off, we will use a local policy (a policy which only local router traffic must abide to). In this policy, we specify ICMP traffic top our stable point, set the next-hop and if this doesnt work, send ICMP to null0 to drop it.

Lets check out the configuration on the CE router:

Define the IP SLA:

ip sla 20
  icmp-echo 43.43.43.43 source-interface FastEthernet0/0
  frequency 5

Schedule the IP SLA to start now and run forever:

ip sla schedule 20 life forever start-time now

Create an access-list for our local traffic:

access-list 110 permit icmp any host 43.43.43.43

Create a policy-map that controls the path the ICMP traffic will take:

route-map ROUTE-PBR permit 10
match ip address 110
  set ip next-hop 10.10.10.2
  set interface Null0

Apply the local policy:

ip local policy route-map ROUTE-PBR

Set up our static routing. Utilize the primary path if tracking object is up, if not use secondary path with an AD of 253:

ip route 0.0.0.0 0.0.0.0 10.10.10.2 track 10
  ip route 0.0.0.0 0.0.0.0 11.11.11.2 253

So lets verify our solution. Under normal circumstances:

CE#sh ip route | beg Gateway
 Gateway of last resort is 10.10.10.2 to network 0.0.0.0
1.0.0.0/32 is subnetted, 1 subnets
 C       1.1.1.1 is directly connected, Loopback0
 10.0.0.0/30 is subnetted, 1 subnets
 C       10.10.10.0 is directly connected, FastEthernet0/0
 11.0.0.0/30 is subnetted, 1 subnets
 C       11.11.11.0 is directly connected, Serial0/0
 S*   0.0.0.0/0 [1/0] via 10.10.10.2

We can see that we have our static route to our primary path. This must mean that our tracking object is up:

CE#sh track
 Track 10
 Response Time Reporter 20 state
 State is Up
 4 changes, last change 00:28:57
 Latest operation return code: OK
 Latest RTT (millisecs) 36
 Tracked by:
 STATIC-IP-ROUTING 0

Everything good here.

Lets verify that we have end-to-end connectivity:

CE#ping 100.100.100.100 so loo0
Type escape sequence to abort.
 Sending 5, 100-byte ICMP Echos to 100.100.100.100, timeout is 2 seconds:
 Packet sent with a source address of 1.1.1.1
 !!!!!
 Success rate is 100 percent (5/5), round-trip min/avg/max = 40/49/60 ms

Superb!

Lets simulate an interface-down scenario on ISP1:

ISP1(config)#int f0/0
 ISP1(config-if)#sh

And now on CE:

*Mar  1 00:48:35.859: %TRACKING-5-STATE: 10 rtr 20 state Up->Down
CE#sh ip route | beg Gateway
 Gateway of last resort is 11.11.11.2 to network 0.0.0.0
1.0.0.0/32 is subnetted, 1 subnets
 C       1.1.1.1 is directly connected, Loopback0
 10.0.0.0/30 is subnetted, 1 subnets
 C       10.10.10.0 is directly connected, FastEthernet0/0
 11.0.0.0/30 is subnetted, 1 subnets
 C       11.11.11.0 is directly connected, Serial0/0
 S*   0.0.0.0/0 [253/0] via 11.11.11.2

Do we still have reachability:

CE#ping 100.100.100.100 so loo0
Type escape sequence to abort.
 Sending 5, 100-byte ICMP Echos to 100.100.100.100, timeout is 2 seconds:
 Packet sent with a source address of 1.1.1.1
 !!!!!
 Success rate is 100 percent (5/5), round-trip min/avg/max = 16/23/28 ms

Awesome.

Lets instead pull the ISP1 link back up, but simulate a routing failure by denying ICMP on ISP1:

ISP1(config-if)#do sh access-list 110
 Extended IP access list 110
 10 deny icmp any any (110 matches)
 20 permit ip any any
 ISP1(config-if)#ip access-group 110 in

Again on CE:

*Mar  1 00:50:20.899: %TRACKING-5-STATE: 10 rtr 20 state Up->Down
 CE#
 CE#ping 100.100.100.100 so loo0
Type escape sequence to abort.
 Sending 5, 100-byte ICMP Echos to 100.100.100.100, timeout is 2 seconds:
 Packet sent with a source address of 1.1.1.1
 !!!!!
 Success rate is 100 percent (5/5), round-trip min/avg/max = 24/31/40 ms

Again our tracking object goes down and we still have reachability. Lets make sure we go through ISP2 now:

CE#traceroute 100.100.100.100 source loo0
Type escape sequence to abort.
 Tracing the route to 100.100.100.100
1 11.11.11.2 20 msec 4 msec 4 msec
 2 172.16.100.4 40 msec *  44 msec

And when ISP1 comes back online:

*Mar  1 00:51:30.915: %TRACKING-5-STATE: 10 rtr 20 state Down->Up
 CE#traceroute 100.100.100.100 source loo0
Type escape sequence to abort.
 Tracing the route to 100.100.100.100
1 10.10.10.2 60 msec 44 msec 28 msec
 2 172.16.100.4 52 msec *  48 msec
 CE#

Nice and very useful.

I hope this is something you can use as well.