As the Internet continues its ever forward march deeper into our lives it is necessary to ensure that we don’t lose access to it. Thus many companies, large and small alike, maintain redundant Internet connections. Many of these sites utilize private addresses on their LAN and thus require the use of NAT to translate out to Internet routable Public IP(s). This article focuses on how a simple SOHO network can be configured to provide fail-over access to a backup Internet link without the complication of BGP or PfR.

Let’s start with a picture showing just such a network.

backup internet access link

As you can see we have dual links, one to each ISP.  To add flavor to the discussion I added two local LAN networks.  This is mostly to show the pros/cons to some of this design.

Note: The IPs in this discussion were chosen for demonstrative purposes and should not be taken as being used in the real world.  I don’t own nor use the 1.1.1.0/24 or 2.2.2.0/24 public IP space.  These IPs were chosen only for their simplicity of use in this discussion.

NAT

Network Address Translation, in this scenario is tricky because most of the time ISPs won’t route traffic sourced from an IP that doesn’t belong to them.  In other words if ISP1 fails and I attempt to use the NAT statements that are configured to translate 192.168.1.0/24 and 192.168.2.0/24 to an ISP1 assigned IP address BUT send the packets to ISP2 for routing it is more than likely that ISP2 will drop the traffic.  Spoofing source IPs to do malicious acts is far to common (and easy) that many service providers simple won’t route traffic sourced from IPs not in their assigned range.  (Obviously if you have a BGP peering relationship with them and prearrange the routing of certain subnets through their network is a completely different story.)

To resolve this situation it is necessary to set up parallel NAT statements for each ISP’s assigned public IPs AND to ensure that the appropriate NAT statement is used for which ISP currently being used.  This is done with route-maps (which is discussed in the next section).  For now we need NAT to identify which interfaces are “outside” and which are “inside” as well as set up some NAT pools of IPs.

interface ethernet 0/0.100
description UserVLAN
ip address 192.168.1.1 255.255.255.0
ip nat inside
!
interface ethernet 0/0.200
description ServerVLAN
ip address 192.168.2.1 255.255.255.0
ip nat inside
!
interface ethernet 0/1
description ISP1
ip address 1.1.1.2 255.255.255.0
ip nat outside
!
interface ethernet 0/2
description ISP2
ip address 2.2.2.2 255.255.255.0
ip nat outside
!
ip nat pool ISP1 1.1.1.100 1.1.1.110 netmask 255.255.255.0
ip nat pool ISP2 2.2.2.200 2.2.2.220 netmask 255.255.255.0

ROUTE-MAPS and ACCESS-LISTS

Route-maps and access-list (ACLs) go hand in hand in this configuration.  The goal is to identify the source IPs and match them to an “outside” interface as well as a next-hop IP that belongs to that outside interface.  First, the ACLs:

ip access-list standard UserVLAN
permit 192.168.1.0 0.0.0.255
!
ip access-list standard ServerVLAN
permit 192.168.2.0 0.0.0.255

Now for the route-maps.  The first statement is to identify what we want to match and the second statement in a particular route-map is to deny everything else.  Technically you don’t need the second statement but I prefer to put it in there for clarity.  Notice how we are matching the source IP ranges from within the enterprise network as well as which outbound interface to use.  If those two items match then set the ip next-hop to be the ISP’s upstream router’s IP.

route-map NAT-ISP1 permit 10
match ip address ServerVLAN UserVLAN
match interface ethernet 0/1
set ip next-hop 1.1.1.1
!
route-map NAT-ISP1 deny 1000
!
route-map NAT-ISP2 permit 10
match ip address ServerVLAN UserVLAN
match interface ethernet0/2
set ip next-hop 2.2.2.1
!
route-map NAT-ISP2 deny 1000

Time to create NAT translation statements using the route-maps and the NAT pools created earlier.

ip nat inside source route-map NAT-ISP1 pool ISP1 overload
ip nat inside source route-map NAT-ISP2 pool ISP2 overload

These last two lines are the bomb.  They describe when to NAT and what pool of IPs to use for NAT’ing.  So if a host from the UserVLAN, say 192.168.1.2, attempts to access, say google.com, the router will need to use the routing table to determine which egress interface to use to get to google.com.  Whether the router chooses the interface to ISP1 or ISP2 they are both associated with the ip nat outside statement.  Thus, before processing the packet out the egress interface the router must first perform whatever the NAT commands tell it to do.  In this case lets say that the egress interface is going to ISP1 (which is out the ethernet 0/1 interface).  The router will attempt to match one of the ip nat inside source statements.  The first of these statements refers to route-map NAT-ISP1 which the router will find a match since the source IP matches the ACL in that route-map AND the egress interface is ethernet 0/1.  The second half of the ip nat inside source statement says to use an IP from the ISP1 NAT pool.  So the router will create a NAT table entry (to handle the return traffic), change the source IP to one from the ISP1 pool, and then route the packet out ethernet 0/1 to 1.1.1.1.

Note:  You can have a NAT pool with only 1 IP in it.  Just put the same IP as the beginning and ending IP in the statement.

ROUTING

With the NAT statements along with their associated route-map and ACL statements set up are we good to go?  Well, not quite.  How do we get the router to choose one of the ISP facing interfaces to route traffic?  The router consults the routing table to make these decision so we need to populate the routing table with routes.  In our case we need default routes (routes of last resort) since any destination prefix that isn’t in our local network will need to be routed to the Internet.  The trick here is that we have two default gateways.  One will be a primary and the other will be the backup path to the Internet.

ip route 0.0.0.0 0.0.0.0 1.1.1.1 track 600
ip route 0.0.0.0 0.0.0.0 2.2.2.1 4

Which of these static routes will be used?  The one with the lowest administrative distance.  By default the administrative distance of a static route to an IP is 1.  Notice how the second routing statement has a 4 at the end?  That is the administrative distance for that static route.  Since it is higher than the default distance (which is what the first statement is using) it will NOT be put into the routing table while the first statement is valid.  When will the first statment NOT be valid?  This is where the track 600 part of the first statement comes into play.

TRACK and IP SLA

Much like how the route-map statements added conditional logic to the NAT statements the track feature, in our case, adds conditional logic to the static route statement.  Why do we need conditional logic on this static route?  Well, how will we know to fail over to the backup static route (and thus the backup NAT statements for ISP2) if the primary static route never leaves the routing table?

First, there are two ways (that I can think of) in which the static route going to ISP1 can be removed from the router’s routing table, without the use of the track feature.  However, both require the network administrator to physically do something to the router.  Either he/she could unplug from the router the cable going to ISP1 (thus invalidating the static route pointing out that interface) or he/she could reconfigure the static route statements and give the ISP1 facing static route a higher administrative distance than that of the static route going to ISP2.  Neither of these solutions can be done automatically and they both require someone with enough know how to perform the task correctly.  Additionally what would the administrator do to change back the aforementioned changes?  How would he/she know when to revert those changes?  Obviously allowing the router to automatically figure out when to change the default route statement is far superior to either manual option.

track 600 ip sla 600 reachability
!
ip sla 600
icmp-echo 8.8.8.8 source-interface ethernet0/1
threshold 2500
frequency 30
ip sla schedule 600 life forever start-time now

Another daisy-chain of statements to weed through.  Remember the static route going to ISP1 referring to track 600?  Well here is that track 600 statement.   Then the track 600 ip sla 600 reachability statement refers to the reachability of an ip sla 600 statement.  Finally the ip sla 600 section specifies a periodic ping of 8.8.8.8 using source interface ethernet 0/1 and is scheduled to run from now until the router is turned off.

Note:  Anything special about the number 600?  Nope.  Just a random number.  I prefer to NOT use common numbers, such as 0, 1, or 2, as they get all confused together when I read an IOS config.  (track 1 vs. access-list 1 vs. ip sla 1 etc.)  That said, I like to match number that go together.  So, in this use case, 600 is used to identify both the IP SLA and TRACK instances (though their respective 600s are independent of each other).

What does it all mean?  Well since track 600 reports “good” on the reachability of pinging 8.8.8.8 via ISP1 and “bad” if that ping fails then the static route it is associated with will be present or withdrawn on this status report.  This test is done every 30 seconds and the ping has a timeout value of 2.5 seconds (2500 ms).

We need one more statement to make this work.  The above mentioned ip sla statement only sources traffic from IP facing ISP1 it doesn’t make the router use the ISP1 as it’s upstream neighbor.  Thus, when ISP1 is down the ping will attempt to route through ISP2.  (Hopefully ISP2 will drop the packet due to the source IP not being from their range of IPs but not every ISP is that security conscience.)  So we need one more static route to ensure that the router will always attempt to ping 8.8.8.8 via ISP1 (whether ISP1 is up or not).

ip route 8.8.8.8 255.255.255.255 1.1.1.1

The downside of doing this is that whatever is at 8.8.8.8 will be unreachable from this enterprise’s network if ISP1 is down.  Be sure to use an IP that isn’t used in your day-to-day business functions.

EEM

Now for the final step.  Technically the above configuration will failover the traffic from the failed ISP1 connection to the ISP2 connection and NAT any new traffic from the correct NAT pool range of IPs.  The above configuration will also fail back automatically to ISP1 once the ip sla returns success on pinging 8.8.8.8 again.  This will also set up any new NAT translations with the NAT pool associated with ISP1.  We are done… right?  Well, almost.

Notice I made it clear, in the previous paragraph, that the NAT translations would select the correct NAT pool for any NEW translation that needs set up.  What about existing translations?  Remember that the router sets up a NAT translation table to match returning traffic so that it will get “un”NATted to the correct internal IP address.  The problem is that we have no way of telling the router to automatically update all the existing translations in the NAT translation table.  Even if we could that wouldn’t help with any returning traffic.  It isn’t like the Internet as a whole knows you’ve changes from one set of outside IPs to a different set.  The best we can do is clear the existing NAT translation table and allow existing connections to reset and attempt to reestablish a connection to their outside service via the new NAT pool of IPs.  There is no built in service or process within the router, that I know of, that will do this by attaching a command, such as a track statement, to the NAT translation statements.  So we have to make the router do it as if it where the network administrator.  Enter Cisco’s Embedded Event Manager (EEM).

event manager applet Clear-NAT-Trans-600
event track 600 state any
action a5 syslog msg “EEM applet Clear-NAT-Trans-600 clearing NAT translations”
action b5 cli command “enable”
action c5 cli command “clear ip nat trans *”

What does this block of code do?  Well it monitors the state of the track 600 command.  (The same track that is being used on the static route via ISP1.)  When an event change occurs with track 600 this EEM script will then take the actions shown.  First it will send out an syslog message indicating it is doing something, then it will go into enable mode and issue the command clear ip nat trans * just like a network administrator would do.

NOTE:  I wish to discuss the “a5, b5, c5” referred into in the EEM script block of code.  EEM scripts work a lot like how the old fashioned programming language BASIC worked.  Each line has a line number.  However, in EEM, the order of the lines is alpha-numeric not just numeric.  This means that if you had lines starting with 1, 1.5, and then 2 the commands would be executed in the order 1, 2, 1.5 since that is how those three items would be alphabetized using an alphanumeric ordering function.  That is why I prefer to start my EEM action steps with an alpha letter and then a number.  That way I remind myself that EEM uses an alphanumeric ordering AND I leave space for me to add intervening statements, should I need to.  Thus given my current three (a5, b5, c5) leaves me room to add something before each or after each ( such as a1 or a9 to put something before or after a5).

Conclusion

Here are a few important show commands to help you see or troubleshoot what is going on in the router:

show ip route
show ip nat translations
show track 600
show ip sla summary

That should cover it.  If you find this informative and/or you need help with configuring this or other enterprise networking features feel free to contact us!