Category Archives: CCIE

ISIS Authentication types (packet captures)

In this post i would like to highlight a couple of “features” of ISIS.
More specifically the authentication mechanism used and how it looks in the data plane.

I will do this by configuring a couple of routers and configure the 2 authentication types available. I will then look at packet captures taken from the link between them and illustrate how its used by the ISIS process.

The 2 types of Authentication are link-level authentication of the Hello messages used to establish an adjacency and the second type is the authentication used to authenticate the LSP’s (Link State Packet) themselves.

First off, here is the extremely simple topology, but its all thats required for this purpose:

Simple, right? 2 routers with 1 link between them on Gig1. They are both running ISIS level-2-only mode, which means they will only try and establish a L2 adjacency with their neighbors. Each router has a loopback interface, which is also advertised into ISIS.

First off, lets look at the relevant configuration of CSR-02 for the Link-level authentication:

key chain MY-CHAIN
 key 1
  key-string WIPPIE
!
interface GigabitEthernet1
 ip address 10.1.2.2 255.255.255.0
 ip router isis 1
 negotiation auto
 no mop enabled
 no mop sysid
 isis authentication mode md5
 isis authentication key-chain MY-CHAIN

Without the same configuration on CSR-01, this is what we see in the data path (captured on CSR-02’s G1 interface):

And we also see that we dont have a full adjacency on CSR-01:

CSR-01#sh isis nei

Tag 1:
System Id       Type Interface     IP Address      State Holdtime Circuit Id
CSR-02          L2   Gi1           10.1.2.2        INIT  26       CSR-02.01

Lets apply the same authentication configuration on CSR-01 and see the result:

key chain MY-CHAIN
 key 1
  key-string WIPPIE
!
interface GigabitEthernet1
 ip address 10.1.2.1 255.255.255.0
 ip router isis 1
 negotiation auto
 no mop enabled
 no mop sysid
 isis authentication mode md5
 isis authentication key-chain MY-CHAIN

We now have a full adjacency:

CSR-01#sh isis neighbors 

Tag 1:
System Id       Type Interface     IP Address      State Holdtime Circuit Id
CSR-02          L2   Gi1           10.1.2.2        UP    8        CSR-02.01     

And we have routes from CSR-02:

CSR-01#sh ip route isis | beg Gate
Gateway of last resort is not set

      2.0.0.0/32 is subnetted, 1 subnets
i L2     2.2.2.2 [115/20] via 10.1.2.2, 00:01:07, GigabitEthernet1

Now, this is what we now see from CSR-02’s perspective again:

The Link-level authentication is fairly easy to spot in no time, because you simply wont have a stable adjacency formed.

The second type is LSP authentication. Lets look at the configuration of CSR-02 for this type of authentication:

CSR-02#sh run | sec router isis
 ip router isis 1
 ip router isis 1
router isis 1
 net 49.0000.0000.0002.00
 is-type level-2-only
 authentication mode text
 authentication key-chain MY-CHAIN

In this example, i have selected plain-text authentication, which i certainly dont recommend in production, but its great for our example.

Again, this is what it looks like in the data packet (from CSR-01 to CSR-02) without authentication enabled on CSR-01:

As you can see, we have the LSP that contains CSR-01’s prefixes, but nowhere is authentication present in the packet.

Lets enable it on CSR-01 and see the result:

CSR-01#sh run | sec router isis
 ip router isis 1
 ip router isis 1
router isis 1
 net 49.0000.0000.0001.00
 is-type level-2-only
 authentication mode text
 authentication key-chain MY-CHAIN

The result in the data packet:

Here we clearly have the authentication (with type = 10 (cleartext)) and we can see the password (WIPPIE) because we have selected cleartext.

The result is we a validated ISIS database on both routers.

Thats all folks, hope it helps to understand the difference between the 2 types of authentication in ISIS.

Take care!

 

 

Progress update – 10/07-2017

Hello folks,

Im currently going through the INE DC videos and learning a lot about fabrics and how they work along with a fair bit of UCS information on top of that!

Im spending an average of 2.5 hours on weekdays for study and a bit more in the weekends when time permits.

I still have no firm commitment to the CCIE DC track, but at some point I need to commit to it and really get behind it. One of these days 😉

I mentioned it to the wife-to-be a couple of days ago and while she didn’t applaud the idea, at least she wasn’t firmly against it, which is always something I guess! Its very important for me to have my family behind me in these endeavours!

Im still a bit concerned about the lack of rack rentals for DCv2 from INE, which is something I need to have in place before I order a bootcamp or more training materials from them. As people know by now, I really do my best learning in front of the “system”, trying out what works and what doesn’t.

Now to spin up a few N9K’s in the lab and play around with NX-OS unicast and multicast routing!

Take care.

A look at Auto-Tunnel Mesh Groups

In this post I would like to give a demonstration of using the Auto-Tunnel Mesh group feature.

As you may know, manual MPLS-TE tunnels are first and foremost unidirectional, meaning that if you do them between two PE nodes, you have to do a tunnel in each direction with the local PE node being the headend.

Now imagine if your network had 10 PE routers and you wanted to do a full mesh between them, this can become pretty burdensome and error-prone.
Thankfully there’s a method to avoid doing this manual configuration and instead rely on your IGP to signal its willingness to become part of a TE “Mesh”. Thats what the Auto-Tunnel Mesh Group feature is all about!

toplogy

In my small SP setup, I only have 3 PE devices, namely PE-1, PE-2 and PE-3. I also only have one P node, called P-1.
However small this setup is, its enough to demonstrate the power of the Auto-Tunnel mesh functionality.

Beyond that, I have setup a small MPLS L3 VPN service for customer CUST-A, which has a presence on all 3 PE nodes. The VPNv4 address-family is using a RR which for this purpose is P-1.

We are running OSPF as the IGP of choice. This means that our Mesh membership will be signaled using Opaque LSA’s, which I will show you later on.

The goal of the lab is to use the Auto-Tunnel mesh functionality to create a full mesh of tunnels between my PE nodes and use this exclusively for label switching and to do so with a general template that would scale to many more PE devices than just the 3 in this lab.

The very first thing you want to do is to enable MPLS-TE both globally and on your interfaces. We can verify this on PE-2:

PE-2:

mpls traffic-eng tunnels
!
interface GigabitEthernet2
ip address 10.2.100.2 255.255.255.0
negotiation auto
mpls traffic-eng tunnels
!

The second thing you want to do is to enable the mesh-feature globally using the following command as configured on PE-2 as well:

PE-2:

mpls traffic-eng auto-tunnel mesh

Starting off with MPLS-TE, we need to make sure our IGP is actually signaling this to begin with. I have configured MPLS-TE on the area 0 which is the only area in use in our topology:

PE-2:

router ospf 1
network 0.0.0.0 255.255.255.255 area 0
mpls traffic-eng router-id Loopback0
mpls traffic-eng area 0
mpls traffic-eng mesh-group 100 Loopback0 area 0

Dont get hung up on the last configuration line. I will explain this shortly. However notice the “mpls traffic-eng area 0” and “mpls traffic-eng router-id loopback0”. After those two lines are configured, you should be able to retrieve information on the MPLS-TE topology as seen from your IGP:

PE-2:

PE-2#sh mpls traffic-eng topology brief
My_System_id: 2.2.2.2 (ospf 1 area 0)

Signalling error holddown: 10 sec Global Link Generation 22

IGP Id: 1.1.1.1, MPLS TE Id:1.1.1.1 Router Node (ospf 1 area 0)
Area mg-id's:
: mg-id 100 1.1.1.1 :
link[0]: Broadcast, DR: 10.1.100.100, nbr_node_id:8, gen:14
frag_id: 2, Intf Address: 10.1.100.1
TE metric: 1, IGP metric: 1, attribute flags: 0x0
SRLGs: None

IGP Id: 2.2.2.2, MPLS TE Id:2.2.2.2 Router Node (ospf 1 area 0)
link[0]: Broadcast, DR: 10.2.100.100, nbr_node_id:9, gen:19
frag_id: 2, Intf Address: 10.2.100.2
TE metric: 1, IGP metric: 1, attribute flags: 0x0
SRLGs: None

IGP Id: 3.3.3.3, MPLS TE Id:3.3.3.3 Router Node (ospf 1 area 0)
Area mg-id's:
: mg-id 100 3.3.3.3 :
link[0]: Broadcast, DR: 10.3.100.100, nbr_node_id:11, gen:22
frag_id: 2, Intf Address: 10.3.100.3
TE metric: 1, IGP metric: 1, attribute flags: 0x0
SRLGs: None

IGP Id: 10.1.2.2, MPLS TE Id:22.22.22.22 Router Node (ospf 1 area 0)
link[0]: Broadcast, DR: 10.1.100.100, nbr_node_id:8, gen:17
frag_id: 3, Intf Address: 10.1.100.100
TE metric: 10, IGP metric: 10, attribute flags: 0x0
SRLGs: None

link[1]: Broadcast, DR: 10.2.100.100, nbr_node_id:9, gen:17
frag_id: 4, Intf Address: 10.2.100.100
TE metric: 10, IGP metric: 10, attribute flags: 0x0
SRLGs: None

link[2]: Broadcast, DR: 10.3.100.100, nbr_node_id:11, gen:17
frag_id: 5, Intf Address: 10.3.100.100
TE metric: 10, IGP metric: 10, attribute flags: 0x0
SRLGs: None

IGP Id: 10.1.100.100, Network Node (ospf 1 area 0)
link[0]: Broadcast, Nbr IGP Id: 10.1.2.2, nbr_node_id:5, gen:13

link[1]: Broadcast, Nbr IGP Id: 1.1.1.1, nbr_node_id:6, gen:13

IGP Id: 10.2.100.100, Network Node (ospf 1 area 0)
link[0]: Broadcast, Nbr IGP Id: 10.1.2.2, nbr_node_id:5, gen:18

link[1]: Broadcast, Nbr IGP Id: 2.2.2.2, nbr_node_id:1, gen:18

IGP Id: 10.3.100.100, Network Node (ospf 1 area 0)
link[0]: Broadcast, Nbr IGP Id: 10.1.2.2, nbr_node_id:5, gen:21

link[1]: Broadcast, Nbr IGP Id: 3.3.3.3, nbr_node_id:7, gen:21

The important thing to notice here is that we are indeed seeing the other routers in the network, all the PE devices as well as the P device.

Now to the last line of configuration under the router ospf process:

PE-2:

"mpls traffic-eng mesh-group 100 Loopback0 area 0"

What this states is that we would like to use the Auto-Tunnel Mesh group feature, with this PE node being a member of group 100, using loopback0 for communication on the tunnel and running within the area 0.

This by itself only handles the signaling, but we also want to deploy a template in order to create the individual tunnel interfaces. This is done in the following manner:

PE-2:

interface Auto-Template100
ip unnumbered Loopback0
tunnel mode mpls traffic-eng
tunnel destination mesh-group 100
tunnel mpls traffic-eng autoroute announce
tunnel mpls traffic-eng path-option 10 dynamic

Using the Auto-Template100 interface, we, as we would also do in manual TE, specify our loopback address, the tunnel mode and the path option. Note that here we are simply following the IGP, which sort of defeats the purpose of many MPLS-TE configurations. But with our topology there is no path diversity so it wouldnt matter anyways.

Also, the autoroute announce command is used to force traffic into the tunnels.

The important thing is the “tunnel destination mesh-group 100” which ties this configuration snippet into the OSPF one.

After everything is setup, you should see some dynamic tunnels being created on each PE node:

PE-2:

PE-2#sh ip int b | incl up
GigabitEthernet1 100.100.101.100 YES manual up up
GigabitEthernet2 10.2.100.2 YES manual up up
Auto-Template100 2.2.2.2 YES TFTP up up
Loopback0 2.2.2.2 YES manual up up
Tunnel64336 2.2.2.2 YES TFTP up up
Tunnel64337 2.2.2.2 YES TFTP up up

Lets verify the current RIB configuration after this step:

PE-2:

PE-2#sh ip route | beg Gateway
Gateway of last resort is not set

1.0.0.0/32 is subnetted, 1 subnets
O 1.1.1.1 [110/12] via 1.1.1.1, 00:29:13, Tunnel64336
2.0.0.0/32 is subnetted, 1 subnets
C 2.2.2.2 is directly connected, Loopback0
3.0.0.0/32 is subnetted, 1 subnets
O 3.3.3.3 [110/12] via 3.3.3.3, 00:28:48, Tunnel64337
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
O 10.1.100.0/24 [110/11] via 10.2.100.100, 00:29:13, GigabitEthernet2
C 10.2.100.0/24 is directly connected, GigabitEthernet2
L 10.2.100.2/32 is directly connected, GigabitEthernet2
O 10.3.100.0/24 [110/11] via 10.2.100.100, 00:29:13, GigabitEthernet2
22.0.0.0/32 is subnetted, 1 subnets
O 22.22.22.22 [110/2] via 10.2.100.100, 00:29:13, GigabitEthernet2

Very good. We can see that in order to reach 1.1.1.1/32 which is PE-1’s loopback, we are indeed routing through one of the dynamic tunnels.
The same goes for 3.3.3.3/32 towards PE-3’s loopback.
PE-2:

PE-2#traceroute 1.1.1.1 so loo0
Type escape sequence to abort.
Tracing the route to 1.1.1.1
VRF info: (vrf in name/id, vrf out name/id)
1 10.2.100.100 [MPLS: Label 17 Exp 0] 16 msec 22 msec 22 msec
2 10.1.100.1 25 msec * 19 msec

We can see that traffic towards that loopback is indeed being label-switched. And just to make it obvious, let me make sure we are not using LDP 🙂

PE-2:

PE-2#sh mpls ldp neighbor
PE-2#

On P-1, it being the midpoint of our LSP’s, we would expect 6 unidirectional tunnels in total:

P-1:

P-1#sh mpls for
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
16 Pop Label 3.3.3.3 64336 [6853] \
472 Et2/0 10.1.100.1
17 Pop Label 2.2.2.2 64336 [2231] \
2880 Et2/0 10.1.100.1
18 Pop Label 1.1.1.1 64336 [4312] \
2924 Et2/1 10.2.100.2
19 Pop Label 1.1.1.1 64337 [4962] \
472 Et2/2 10.3.100.3
20 Pop Label 2.2.2.2 64337 [6013] \
562 Et2/2 10.3.100.3
21 Pop Label 3.3.3.3 64337 [4815] \
0 Et2/1 10.2.100.2

Exactly what we expected.
The following is the output of the command: “show ip ospf database opaque-area” on PE-2. I have cut it down to the relevant opaque-lsa part (we are using 2 types, one for the general MPLS-TE and one for the Mesh-Group feature):

LS age: 529
Options: (No TOS-capability, DC)
LS Type: Opaque Area Link
Link State ID: 4.0.0.0
Opaque Type: 4
Opaque ID: 0
Advertising Router: 1.1.1.1
LS Seq Number: 80000002
Checksum: 0x5364
Length: 32

Capability Type: Mesh-group
Length: 8
Value:

0000 0064 0101 0101

LS age: 734
Options: (No TOS-capability, DC)
LS Type: Opaque Area Link
Link State ID: 4.0.0.0
Opaque Type: 4
Opaque ID: 0
Advertising Router: 2.2.2.2
LS Seq Number: 80000002
Checksum: 0x6748
Length: 32

Capability Type: Mesh-group
Length: 8
Value:

0000 0064 0202 0202

LS age: 701
Options: (No TOS-capability, DC)
LS Type: Opaque Area Link
Link State ID: 4.0.0.0
Opaque Type: 4
Opaque ID: 0
Advertising Router: 3.3.3.3
LS Seq Number: 80000002
Checksum: 0x7B2C
Length: 32

Capability Type: Mesh-group
Length: 8
Value:

0000 0064 0303 0303

I have highlighted the interesting parts, which is the Advertising Router and the value of the TLV, those starting with 0000 0064, which is in fact the membership of “100” being signaled across the IGP area.
Okay, all good i hear you say, but lets do an end-to-end test from the CE devices in Customer CUST-A’s domain:

R1:

R1#sh ip route | beg Gateway
Gateway of last resort is not set

10.0.0.0/32 is subnetted, 3 subnets
C 10.1.1.1 is directly connected, Loopback0
B 10.2.2.2 [20/0] via 100.100.100.100, 00:37:46
B 10.3.3.3 [20/0] via 100.100.100.100, 00:37:36
100.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C 100.100.100.0/24 is directly connected, FastEthernet0/0
L 100.100.100.1/32 is directly connected, FastEthernet0/0

So we are learning the routes on the customer side (through standard IPv4 BGP).

R1:

R1#ping 10.2.2.2 so loo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/72/176 ms
R1#ping 10.3.3.3 so loo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.3.3.3, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/32/48 ms

We have reachability! – What about traceroute:

R1:

R1#traceroute 10.2.2.2 so loo0

Type escape sequence to abort.
Tracing the route to 10.2.2.2

1 100.100.100.100 28 msec 20 msec 12 msec
2 10.1.100.100 [MPLS: Labels 18/17 Exp 0] 44 msec 136 msec 60 msec
3 100.100.101.100 [MPLS: Label 17 Exp 0] 28 msec 32 msec 12 msec
4 100.100.101.3 28 msec 32 msec 24 msec
R1#traceroute 10.3.3.3 so loo0

Type escape sequence to abort.
Tracing the route to 10.3.3.3

1 100.100.100.100 48 msec 16 msec 8 msec
2 10.1.100.100 [MPLS: Labels 19/17 Exp 0] 48 msec 12 msec 52 msec
3 100.100.102.100 [MPLS: Label 17 Exp 0] 16 msec 28 msec 36 msec
4 100.100.102.4 68 msec 56 msec 48 msec

Just what we would expect from our L3 MPLS VPN service. A transport label (this time through MPLS-TE) and a VPN label as signaled through MP-BGP.

To round it off, I have attached the following from a packet capture on P-1’s interface toward PE-1 and then re-issued the ICMP-echo from R1’s loopback toward R2’s loopback adress:

wireshark-output

With that, I hope its been informative for you. Thanks for reading!

References:

http://www.cisco.com/c/en/us/td/docs/ios/12_0s/feature/guide/gsmeshgr.html

Configurations:

configurations

Practical DMVPN Example

In this post, I will put together a variety of different technologies involved in a real-life DMVPN deployment.

This includes things such as the correct tunnel configuration, routing-configuration using BGP as the protocol of choice, as well as NAT toward an upstream provider and front-door VRF’s in order to implement a default-route on both the Hub and the Spokes and last, but not least a newer feature, namely Per-Tunnel QoS using NHRP.

So I hope you will find the information relevant to your DMVPN deployments.

First off, lets take a look at the topology I will be using for this example:
Topology

As can be seen, we have a hub router which is connected to two different ISP’s. One to a general purpose internet provider (the internet cloud in this topology) which is being used as transport for our DMVPN setup, as well as a router in the TeleCom network (AS 59701), providing a single route for demonstration purposes (8.8.8.8/32). We have been assigned the 70.0.0.0/24 network from TeleCom to use for internet access as well.

Then we have to Spoke sites, with a single router in each site (Spoke-01 and Spoke-02 respectively).
Each one has a loopback interface which is being announced.

The first “trick” here, is to use the so-called Front Door VRF feature. What this basically allows is to have your transport interface located in a separate VRF. This allows us to have 2 default (0.0.0.0) routes. One used for the transport network and one for the “global” VRF, which is being used by the clients behind each router.

I have created a VRF on the 3 routers in our network (the Hub, Spoke-01 and Spoke-02) called “Inet_VRF”. Lets take a look at the configuration for this network along with its routing information (RIB):


HUB#sh run | beg vrf defi
vrf definition Inet_VRF
!
address-family ipv4
exit-address-family

HUB#sh ip route vrf Inet_VRF | beg Ga
Gateway of last resort is 130.0.0.1 to network 0.0.0.0

S* 0.0.0.0/0 [1/0] via 130.0.0.1
130.0.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 130.0.0.0/30 is directly connected, GigabitEthernet1
L 130.0.0.2/32 is directly connected, GigabitEthernet1

Very simple indeed. We are just using the IPv4 address-family for this VRF and we have a static default route pointing toward the Internet Cloud.

The spokes are very similar:

Spoke-01:





Spoke-01#sh ip route vrf Inet_VRF | beg Gat
Gateway of last resort is 140.0.0.1 to network 0.0.0.0

S* 0.0.0.0/0 [1/0] via 140.0.0.1
140.0.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 140.0.0.0/30 is directly connected, GigabitEthernet1
L 140.0.0.2/32 is directly connected, GigabitEthernet1



Spoke-02:




Spoke-02#sh ip route vrf Inet_VRF | beg Gat
Gateway of last resort is 150.0.0.1 to network 0.0.0.0

S* 0.0.0.0/0 [1/0] via 150.0.0.1
150.0.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 150.0.0.0/30 is directly connected, GigabitEthernet1
L 150.0.0.2/32 is directly connected, GigabitEthernet1



With this in place, we should have full reachability to the internet interface address of each router in the Inet_VRF:


HUB#ping vrf Inet_VRF 140.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 140.0.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 6/22/90 ms

HUB#ping vrf Inet_VRF 150.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 150.0.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/16/65 ms

With this crucial piece of configuration for the transport network, we can now start building our DMVPN network, starting with the Hub configuration:

HUB#sh run int tun100
Building configuration...



Current configuration : 452 bytes
!
interface Tunnel100
ip address 172.16.0.100 255.255.255.0
no ip redirects
ip mtu 1400
ip nat inside
ip nhrp network-id 100
ip nhrp redirect
ip tcp adjust-mss 1360
load-interval 30
nhrp map group 10MB-Group service-policy output 10MB-Parent
nhrp map group 30MB-Group service-policy output 30MB-Parent
tunnel source GigabitEthernet1
tunnel mode gre multipoint
tunnel vrf Inet_VRF
tunnel protection ipsec profile DMVPN-PROFILE shared
end



There are a fair bit of configuration here. However, pay attention to the “tunnel vrf Inet_VRF” command, as this is the piece that ties into your transport address for the tunnel. So basically we use G1 for the interface, and this is located in the Inet_VRF.

Also notice that we are running crypto on top of our tunnel to protect it from prying eyes. The relevant configuration is here:

crypto keyring MY-KEYRING vrf Inet_VRF
pre-shared-key address 0.0.0.0 0.0.0.0 key SUPER-SECRET
!
!
!
!
!
crypto isakmp policy 1
encr aes 256
hash sha256
authentication pre-share
group 2
!
!
crypto ipsec transform-set TRANSFORM-SET esp-aes 256 esp-sha256-hmac
mode transport
!
crypto ipsec profile DMVPN-PROFILE
set transform-set TRANSFORM-SET



Pretty straightforward with a static pre shared key in place for all nodes.

With the crypto in place, you should have a SA for it installed:


HUB#sh crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst src state conn-id status
130.0.0.2 150.0.0.2 QM_IDLE 1002 ACTIVE
130.0.0.2 140.0.0.2 QM_IDLE 1001 ACTIVE



One for each spoke is in place an in the correct state (QM_IDLE = Good).

So now, lets verify the entire DMVPN solution with a few “show” commands:

HUB#sh dmvpn
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
T1 - Route Installed, T2 - Nexthop-override
C - CTS Capable
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunnel
==========================================================================

Interface: Tunnel100, IPv4 NHRP Details
Type:Hub, NHRP Peers:2,

# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 140.0.0.2 172.16.0.1 UP 05:24:09 D
1 150.0.0.2 172.16.0.2 UP 05:24:03 D


We have 2 spokes associated with our Tunnel100. One where the public address (NBMA) is 140.0.02 and the other 150.0.0.2. Inside the tunnel, the spokes have an IPv4 address of 172.16.0.1 and .2 respectively. Also, we can see that these are being learned Dynamically (The D in the 5 column).

All is well and good so far. But we still need to run a routing protocol across the tunnel interface in order to exchange routes. BGP and EIGRP are good candidates for this and in this example I have used BGP. And since we are running Phase 3 DMVPN, we can actually get away with just receiving a default route on the spokes!. At this point remember that our previous default route pointing toward the internet was in the Inet_VRF table, so these two wont collide.

Take a look at the BGP configuration on the hub router:


HUB#sh run | beg router bgp
router bgp 100
bgp log-neighbor-changes
bgp listen range 172.16.0.0/24 peer-group MYPG
network 70.0.0.0 mask 255.255.255.0
neighbor MYPG peer-group
neighbor MYPG remote-as 100
neighbor MYPG next-hop-self
neighbor MYPG default-originate
neighbor MYPG route-map ONLY-DEFAULT-MAP out
neighbor 133.1.2.2 remote-as 59701
neighbor 133.1.2.2 route-map TO-UPSTREAM-SP out



And the referenced route-maps:


ip prefix-list ONLY-DEFAULT-PFX seq 5 permit 0.0.0.0/0
!
ip prefix-list OUR-SCOPE-PFX seq 5 permit 70.0.0.0/24
!
route-map TO-UPSTREAM-SP permit 5
match ip address prefix-list OUR-SCOPE-PFX
!
route-map TO-UPSTREAM-SP deny 10
!
route-map ONLY-DEFAULT-MAP permit 10
match ip address prefix-list ONLY-DEFAULT-PFX



We are using the BGP listen feature, which makes it dynamic to setup BGP peers. We allow everything in the 172.16.0.0/24 network to setup a BGP session and we are using the Peer-Group MYPG for controlling the settings. Notice that we are sending out only a default route to the spokes.

Also pay attention to the fact that we are sending the 70.0.0.0/24 upstream to the TeleCom ISP. Since we are going to use this network for NAT’ing purposes only, we have a static route to Null0 installed as well:


HUB#sh run | incl ip route
ip route 70.0.0.0 255.255.255.0 Null0



For the last part of our BGP configuration, lets take a look at the Spoke configuration, which is very simple and straightforward:


Spoke-01#sh run | beg router bgp
router bgp 100
bgp log-neighbor-changes
redistribute connected route-map ONLY-LOOPBACK0
neighbor 172.16.0.100 remote-as 100



And the associated route-map:


route-map ONLY-LOOPBACK0 permit 10
match interface Loopback0

So basically thats a cookie-cutter configuration thats being reused on Spoke-02 as well.

So what does the routing end up with on the Hub side of things:


HUB# sh ip bgp | beg Networ
Network Next Hop Metric LocPrf Weight Path
0.0.0.0 0.0.0.0 0 i
*>i 1.1.1.1/32 172.16.0.1 0 100 0 ?
*>i 2.2.2.2/32 172.16.0.2 0 100 0 ?
*> 8.8.8.8/32 133.1.2.2 0 0 59701 i
*> 70.0.0.0/24 0.0.0.0 0 32768 i

We have a default route, which we injected using the default-information originate command. Then we receive the loopback addresses from each of the two spokes. Finally we have the network statement, the hub inserted, sending it to the upstream TeleCom. Finally we have 8.8.8.8/32 from TeleCom 🙂

Lets look at the spokes:


Spoke-01#sh ip bgp | beg Network
Network Next Hop Metric LocPrf Weight Path
*>i 0.0.0.0 172.16.0.100 0 100 0 i
*> 1.1.1.1/32 0.0.0.0 0 32768 ?

Very straightforward and simple. A default from the hub and the loopback we ourself injected.

Right now, we are in a situation where we have the correct routing information and we are ready to let NHRP do its magic. Lets take a look what happens when Spoke-01 sends a ping to Spoke-02’s loopback address:


Spoke-01#ping 2.2.2.2 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/17/32 ms

Everything works, and we the math is right, we should see an NHRP shortcut being created for the Spoke to Spoke tunnel:


Spoke-01#sh ip nhrp shortcut
2.2.2.2/32 via 172.16.0.2
Tunnel100 created 00:00:06, expire 01:59:53
Type: dynamic, Flags: router rib
NBMA address: 150.0.0.2
Group: 30MB-Group
172.16.0.2/32 via 172.16.0.2
Tunnel100 created 00:00:06, expire 01:59:53
Type: dynamic, Flags: router nhop rib
NBMA address: 150.0.0.2
Group: 30MB-Group

and on Spoke-02:


Spoke-02#sh ip nhrp shortcut
1.1.1.1/32 via 172.16.0.1
Tunnel100 created 00:01:29, expire 01:58:31
Type: dynamic, Flags: router rib
NBMA address: 140.0.0.2
Group: 10MB-Group
172.16.0.1/32 via 172.16.0.1
Tunnel100 created 00:01:29, expire 01:58:31
Type: dynamic, Flags: router nhop rib
NBMA address: 140.0.0.2
Group: 10MB-Group

And the RIB on both routers should reflect this as well:


Gateway of last resort is 172.16.0.100 to network 0.0.0.0



B* 0.0.0.0/0 [200/0] via 172.16.0.100, 06:09:37
1.0.0.0/32 is subnetted, 1 subnets
C 1.1.1.1 is directly connected, Loopback0
2.0.0.0/32 is subnetted, 1 subnets
H 2.2.2.2 [250/255] via 172.16.0.2, 00:02:08, Tunnel100
172.16.0.0/16 is variably subnetted, 3 subnets, 2 masks
C 172.16.0.0/24 is directly connected, Tunnel100
L 172.16.0.1/32 is directly connected, Tunnel100
H 172.16.0.2/32 is directly connected, 00:02:08, Tunnel100

The “H” routes are from NHRP

And just to verify that our crypto is working, heres a capture from wireshark on the internet “Cloud” when pinging from Spoke-01 to Spoke-02:

wireshark

Now lets turn our attention to the Quality of Service aspect of our solution.

We have 3 facts to deal with.

1) The Hub router has a line-rate Gigabit Ethernet circuit to the Internet.
2) The Spoke-01 site has a Gigabit Ethernet circuit, but its a subrate to 10Mbit access-rate.
3) The Spoke-02 site has a Gigabit Ethernet circuit, but its a subrate to 30Mbit access-rate.

We somehow want to signal to the Hub site to “respect” these access-rates. This is where the “Per-Tunnel QoS” feature comes into play.

If you remember the Hub tunnel100 configuration, which looks like this:


interface Tunnel100
ip address 172.16.0.100 255.255.255.0
no ip redirects
ip mtu 1400
ip nat inside
ip nhrp network-id 100
ip nhrp redirect
ip tcp adjust-mss 1360
load-interval 30
nhrp map group 10MB-Group service-policy output 10MB-Parent
nhrp map group 30MB-Group service-policy output 30MB-Parent
tunnel source GigabitEthernet1
tunnel mode gre multipoint
tunnel vrf Inet_VRF
tunnel protection ipsec profile DMVPN-PROFILE shared



We have these 2 “nhrp map” statements. What these in effect does is to provide a framework for the Spokes to register using one of these two maps for the Hub to use to that individual spoke.

So these are the policy-maps we reference:


HUB#sh policy-map
Policy Map 30MB-Child
Class ICMP
priority 5 (kbps)
Class TCP
bandwidth 50 (%)

Policy Map 10MB-Parent
Class class-default
Average Rate Traffic Shaping
cir 10000000 (bps)
service-policy 10MB-Child

Policy Map 10MB-Child
Class ICMP
priority 10 (%)
Class TCP
bandwidth 80 (%)


Policy Map 30MB-Parent
Class class-default
Average Rate Traffic Shaping
cir 30000000 (bps)
service-policy 30MB-Child



We have a hiearchical policy for both the 10Mbit and 30Mbit groups. each with their own child policies.

On the Spoke side of things, all we have to do is to tell the Hub which group to use:


interface Tunnel100
bandwidth 10000
ip address 172.16.0.1 255.255.255.0
no ip redirects
ip mtu 1400
ip nhrp map 172.16.0.100 130.0.0.2
ip nhrp map multicast 130.0.0.2
ip nhrp network-id 100
ip nhrp nhs 172.16.0.100
ip nhrp shortcut
ip tcp adjust-mss 1360
load-interval 30
nhrp group 10MB-Group
qos pre-classify
tunnel source GigabitEthernet1
tunnel mode gre multipoint
tunnel vrf Inet_VRF
tunnel protection ipsec profile DMVPN-PROFILE



Here on Spoke-01, we request that the QoS group to be used is 10MB-Group.

And on Spoke-02:


interface Tunnel100
bandwidth 30000
ip address 172.16.0.2 255.255.255.0
no ip redirects
ip mtu 1400
ip nhrp map 172.16.0.100 130.0.0.2
ip nhrp map multicast 130.0.0.2
ip nhrp network-id 100
ip nhrp nhs 172.16.0.100
ip nhrp shortcut
ip tcp adjust-mss 1360
load-interval 30
nhrp group 30MB-Group
qos pre-classify
tunnel source GigabitEthernet1
tunnel mode gre multipoint
tunnel vrf Inet_VRF
tunnel protection ipsec profile DMVPN-PROFILE



We request the 30MB-Group.

So lets verify that the Hub understands this and applies it accordingly:


HUB#sh nhrp group-map
Interface: Tunnel100
NHRP group: 10MB-Group
QoS policy: 10MB-Parent
Transport endpoints using the qos policy:
140.0.0.2



NHRP group: 30MB-Group
QoS policy: 30MB-Parent
Transport endpoints using the qos policy:
150.0.0.2



Excellent. and to see that its actually applied correctly:


HUB#sh policy-map multipoint tunnel 100

 

Interface Tunnel100 <--> 140.0.0.2

Service-policy output: 10MB-Parent

Class-map: class-default (match-any)
903 packets, 66746 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: any
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 900/124744
shape (average) cir 10000000, bc 40000, be 40000
target shape rate 10000000

Service-policy : 10MB-Child

queue stats for all priority classes:
Queueing
queue limit 512 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 10/1860

Class-map: ICMP (match-all)
10 packets, 1240 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: protocol icmp
Priority: 10% (1000 kbps), burst bytes 25000, b/w exceed drops: 0
Class-map: TCP (match-all)
890 packets, 65494 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: access-group 110
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 890/122884
bandwidth 80% (8000 kbps)

Class-map: class-default (match-any)
3 packets, 12 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: any

queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 0/0

Interface Tunnel100 <--> 150.0.0.2

Service-policy output: 30MB-Parent

Class-map: class-default (match-any)
901 packets, 66817 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: any
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 901/124898
shape (average) cir 30000000, bc 120000, be 120000
target shape rate 30000000

Service-policy : 30MB-Child

queue stats for all priority classes:
Queueing
queue limit 512 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 10/1860

Class-map: ICMP (match-all)
10 packets, 1240 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: protocol icmp
Priority: 5 kbps, burst bytes 1500, b/w exceed drops: 0
Class-map: TCP (match-all)
891 packets, 65577 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: access-group 110
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 891/123038
bandwidth 50% (15000 kbps)

Class-map: class-default (match-any)
0 packets, 0 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: any

queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 0/0


The last piece of the QoS puzzle is to make sure you have a service-policy applied on the transport interfaces on the spokes as well:


Spoke-01#sh run int g1
Building configuration...

&nbsp;


Current configuration : 210 bytes
!
interface GigabitEthernet1
description -= Towards Internet Router =-
bandwidth 30000
vrf forwarding Inet_VRF
ip address 140.0.0.2 255.255.255.252
negotiation auto
service-policy output 10MB-Parent
end



and on Spoke-02:


poke-02#sh run int g1
Building configuration...

&nbsp;


Current configuration : 193 bytes
!
interface GigabitEthernet1
description -= Towards Internet Router =-
vrf forwarding Inet_VRF
ip address 150.0.0.2 255.255.255.252
negotiation auto
service-policy output 30MB-Parent
end



The last thing i want to mention is the NAT on the hub to use the 70.0.0.0/24 network for the outside world. Pretty straight forward NAT (inside on the tunnel interface 100 and outside on the egress interface toward Telecom, G2):


HUB#sh run int g2
Building configuration...

&nbsp;


Current configuration : 106 bytes
!
interface GigabitEthernet2
ip address 133.1.2.1 255.255.255.252
ip nat outside
negotiation auto
end



Also the NAT configuration itself:


ip nat pool NAT-POOL 70.0.0.1 70.0.0.253 netmask 255.255.255.0
ip nat inside source list 10 pool NAT-POOL overload
!
HUB#sh access-list 10
Standard IP access list 10
10 permit 1.1.1.1
20 permit 2.2.2.2



We are only NAT’ing the two loopbacks from the spokes on our network.

Lets do a final verification on the spokes to 8.8.8.8/32:




Spoke-01#ping 8.8.8.8 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/23/56 ms

and Spoke-02:


Spoke-02#ping 8.8.8.8 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
Packet sent with a source address of 2.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/14/27 ms



Lets verify the NAT state table on the hub:


HUB#sh ip nat trans
Pro Inside global Inside local Outside local Outside global
icmp 70.0.0.1:1 1.1.1.1:5 8.8.8.8:5 8.8.8.8:1
icmp 70.0.0.1:2 2.2.2.2:0 8.8.8.8:0 8.8.8.8:2
Total number of translations: 2



All good!.

I hope you have had a chance to look at some of the fairly simple configuration snippets thats involved in these techniques and how they fit together in the overall scheme of things.

If you have any questions, please let me know!

Have fun with the lab!

(Configurations will be added shortly!)

GETVPN Example

A couple of weeks ago I had the good fortune of attending Jeremy Filliben’s CCDE Bootcamp.
It was a great experience, which I will elaborate on in another post. But one of the technology areas I had a bit of difficult with, was GETVPN.

So in this post a I am going to setup a scenario in which a customer has 3 sites, 2 “normal” sites and a Datacenter site. The customer wants to encrypt traffic from Site 1 to Site 2.

Currently the customer has a regular L3VPN service from a provider (which is beyond the scope of this post). There is full connectivity between the 3 sites through this service.

The topology is as follows:

Topology

GETVPN consists of a few components, namely the Key Server (KS) and Group Members (GM’s), which is where it derives its name: Group Encrypted Transport. A single SA (Security Association) is used for the encryption. The Key Server distributes the information to the Group Members through a secure transport, where the Group Members then use this information (basically an ACL) to encrypt/decrypt the data packets.

The routing for the topology is fairly simple. (See Routing Diagram) Each client as well as the KeyServer uses a default route to reach the rest of the topology. Each CE router runs eBGP with the provider, where it redistributes the conntected interfaces into BGP for full reachability between the sites.

Routing-Topology

At this point, lets verify that we have full connectivty through the L3VPN SP.

On CE-1:

CE1#sh ip bgp
BGP table version is 7, local router ID is 192.168.12.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>  10.10.1.0/24     0.0.0.0                  0         32768 ?
 *>  10.10.2.0/24     10.10.1.2                              0 100 100 ?
 *>  10.10.3.0/24     10.10.1.2                              0 100 100 ?
 *>  192.168.12.0     0.0.0.0                  0         32768 ?
 *>  192.168.23.0     10.10.1.2                              0 100 100 ?
 *>  192.168.34.0     10.10.1.2                              0 100 100 ?

We are learning the routes to the other sites.

And connectivity from Client-1:

Client-1#ping 192.168.34.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.34.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/10/25 ms

The interesting part takes place on the KeyServer along wi th CE1 and CE3.

If we take a look at the configuration on the KeyServer.

First off, we have a regular extended ACL that defines what traffic we want to encrypt. This ACL is the one that gets “downloaded” to CE1 and CE3:

ip access-list extended CRYPTO_ACL
 permit ip 192.168.12.0 0.0.0.255 192.168.34.0 0.0.0.255
 permit ip 192.168.34.0 0.0.0.255 192.168.12.0 0.0.0.255
 

 

Register-ACL-Download

Next up we have an ISAKMP policy which is used during the information communication with the KeyServer. This policy is present on all the Group Members (GM’s) and the KeyServer:

crypto isakmp policy 10
 encr aes 256
 hash sha256
 authentication pre-share
 group 2
crypto isakmp key SUPERSECRET address 0.0.0.0        

In this example we use a simple Pre Shared Key with the Any address form. This can (and probably should) be either certificate based. However, this complicates matters, so i skipped that.

Next is the transform set for IPsec which will be used. Notice that we use tunnel mode.

crypto ipsec transform-set GET-VPN-TRANSFORM-SET esp-aes esp-sha256-hmac 
 mode tunnel

This transform set is being referenced in a IPsec profile configuration:

crypto ipsec profile GETVPN-PROFILE
 set transform-set GET-VPN-TRANSFORM-SET 

This is nesecary in order for the next configuration, which is the entire GDOI aspect:

crypto gdoi group GDOI-GROUP
 identity number 100
 server local
  rekey authentication mypubkey rsa GETVPN-KEY
  rekey transport unicast
  sa ipsec 1
   profile GETVPN-PROFILE
   match address ipv4 CRYPTO_ACL
   replay counter window-size 64
   no tag
  address ipv4 192.168.23.1

Here we are creating a GDOI configuration, where we have a unique identifier for this group configuration (100). We are telling the router that its the server. Next is the public key we have created with an name this time (“crypto key generate rsa label “). This is used for rekeying purposes. And notice that we are using unicasting for the key material. This could just as well have been multicast, but again, that requires you have your infrastructure multicast capable and ready.

We then reference our previous IPsec profile and specify our crypt “ACL”. Lastly we specify which “update source” should be used for this server (which the other GM’s will use to communicate to/from).

If we then match this to what is configured on CE1 and CE3:

crypto isakmp policy 10
 encr aes 256
 hash sha256
 authentication pre-share
 group 2
crypto isakmp key SUPERSECRET address 0.0.0.0        
crypto gdoi group GDOI-GROUP
 identity number 100
 server address ipv4 192.168.23.1
crypto map MYMAP 10 gdoi 
 set group GDOI-GROUP
 crypto map MYMAP

And on the interface towards the SP we apply the crypto map:

CE1#sh run int g1.10
Building configuration...

Current configuration : 115 bytes
!
interface GigabitEthernet1.10
 encapsulation dot1Q 10
 ip address 10.10.1.1 255.255.255.0
 crypto map MYMAP
end

 

Crypto Map Topology

We can see that we have the ISAKMP configuration which I mentioned thats being used for a secure communication channel. Next we simply have the location of our KeyServer and the Identifier and thats pretty much all. Everything else is being learned from the Key Server.

After everything has been configured, you can see the log showing the registration process:

*May 15 10:37:53.245: %CRYPTO-5-GM_REGSTER: Start registration to KS 192.168.23.1 for group GDOI-GROUP using address 10.10.3.1 fvrf default ivrf default
*May 15 10:38:23.356: %GDOI-5-SA_TEK_UPDATED: SA TEK was updated
*May 15 10:38:23.395: %GDOI-5-SA_KEK_UPDATED: SA KEK was updated 0x5DB57E80F97A9A1DC16B9DBBCF7CB169
*May 15 10:38:23.395: %GDOI-5-GM_REGS_COMPL: Registration to KS 192.168.23.1 complete for group GDOI-GROUP using address 10.10.3.1 fvrf default ivrf default
*May 15 10:38:23.668: %GDOI-5-GM_INSTALL_POLICIES_SUCCESS: SUCCESS: Installation of Reg/Rekey policies from KS 192.168.23.1 for group GDOI-GROUP & gm identity 10.10.3.1 fvrf default ivrf default

Another form of verification is the “show crypto gdoi” command structure, which gives you alot of information on the process:

CE1#sh crypto gdoi 
GROUP INFORMATION

    Group Name               : GDOI-GROUP
    Group Identity           : 100
    Group Type               : GDOI (ISAKMP)
    Crypto Path              : ipv4
    Key Management Path      : ipv4
    Rekeys received          : 0
    IPSec SA Direction       : Both

     Group Server list       : 192.168.23.1
                               
Group Member Information For Group GDOI-GROUP:
    IPSec SA Direction       : Both
    ACL Received From KS     : gdoi_group_GDOI-GROUP_temp_acl

    Group member             : 10.10.1.1       vrf: None
       Local addr/port       : 10.10.1.1/848
       Remote addr/port      : 192.168.23.1/848
       fvrf/ivrf             : None/None
       Version               : 1.0.16
       Registration status   : Registered
       Registered with       : 192.168.23.1
       Re-registers in       : 1580 sec
       Succeeded registration: 1
       Attempted registration: 3
       Last rekey from       : 0.0.0.0
       Last rekey seq num    : 0
       Unicast rekey received: 0
       Rekey ACKs sent       : 0
       Rekey Received        : never
       DP Error Monitoring   : OFF
       IPSEC init reg executed    : 0
       IPSEC init reg postponed   : 0
       Active TEK Number     : 1
       SA Track (OID/status) : disabled

       allowable rekey cipher: any
       allowable rekey hash  : any
       allowable transformtag: any ESP

    Rekeys cumulative
       Total received        : 0
       After latest register : 0
       Rekey Acks sents      : 0

 ACL Downloaded From KS 192.168.23.1:
   access-list   permit ip 192.168.12.0 0.0.0.255 192.168.34.0 0.0.0.255
   access-list   permit ip 192.168.34.0 0.0.0.255 192.168.12.0 0.0.0.255

KEK POLICY:
    Rekey Transport Type     : Unicast
    Lifetime (secs)          : 84613
    Encrypt Algorithm        : 3DES
    Key Size                 : 192     
    Sig Hash Algorithm       : HMAC_AUTH_SHA
    Sig Key Length (bits)    : 2352    

TEK POLICY for the current KS-Policy ACEs Downloaded:
  GigabitEthernet1.10:
    IPsec SA:
        spi: 0xA3D6592E(2748733742)
        KGS: Disabled
        transform: esp-aes esp-sha256-hmac 
        sa timing:remaining key lifetime (sec): (1815)
        Anti-Replay(Counter Based) : 64
        tag method : disabled
        alg key size: 16 (bytes)
        sig key size: 32 (bytes)
        encaps: ENCAPS_TUNNEL

Among the most interesting is the KEK policy and the ACL thats in place.

If we then verify from Client-1, we can see that we have a couple of seconds timeout while the encryption is being setup, and from there we have connectivity:

Client-1#ping 192.168.34.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.34.1, timeout is 2 seconds:
..!!!
Success rate is 60 percent (3/5), round-trip min/avg/max = 2/2/2 ms

So from something thats in theory very complex, this is very efficient from a both a configuration as well as a control-plane point of view. I know it certainly helped me understand the steps involved in setting GETVPN to create this lab, so I hope its been relevant for you as well!

MPLS VPN’s over mGRE

This blog post outlines what “MPLS VPNs over mGRE” is all about as well as provide an example of such a configuration.

So what is “MPLS VPNs over mGRE”? – Well, basically its taking regular MPLS VPN’s and using it over an IP only core network. Since VPN’s over MPLS is one of the primary drivers for implementing an MPLS network in the first place, using the same functionality over an IP-only core might be very compelling for some not willing/able to run MPLS label switching in the core.

Instead of using labels to switch the traffic from one PE to another, mGRE (Multipoint GRE) is used as the encapsulation technology instead.

Be advised that 1 label is still being used however. This is the VPN label that’s used to identify which VRF interface to switch the traffic to when its received by a PE. This label is, just as in regular MPLS VPN’s, assigned by the PE through MP-BGP.

So how is this actually performed? – Well, lets take a look at an example.

The topology I will be using is as follows:

Topology for MPLS VPN's over mGRE

Topology for MPLS VPN’s over mGRE

** Note: I ran into an issue with VIRL, causing my CSR-3 to R3 to fail when establishing EIGRP adjacency. So i will not be using this in the examples to come. I noted this behavior on the VIRL community forums in case you are interested.

In this topology we have a core network, consisting of CSR-1 to CSR-5. They are all running OSPF in area 0. No MPLS is configured, so its pure IP routing end-to-end.

Lets take a look at CSR-5’s RIB:

CSR-5#sh ip route | beg Gateway
Gateway of last resort is not set

      1.0.0.0/32 is subnetted, 1 subnets
O        1.1.1.1 [110/2] via 192.168.15.1, 00:39:00, GigabitEthernet2
      2.0.0.0/32 is subnetted, 1 subnets
O        2.2.2.2 [110/2] via 192.168.25.2, 00:38:50, GigabitEthernet3
      3.0.0.0/32 is subnetted, 1 subnets
O        3.3.3.3 [110/2] via 192.168.35.3, 00:38:50, GigabitEthernet4
      4.0.0.0/32 is subnetted, 1 subnets
O        4.4.4.4 [110/2] via 192.168.45.4, 00:39:10, GigabitEthernet5
      5.0.0.0/32 is subnetted, 1 subnets
C        5.5.5.5 is directly connected, Loopback0
      192.168.15.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.15.0/24 is directly connected, GigabitEthernet2
L        192.168.15.5/32 is directly connected, GigabitEthernet2
      192.168.25.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.25.0/24 is directly connected, GigabitEthernet3
L        192.168.25.5/32 is directly connected, GigabitEthernet3
      192.168.35.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.35.0/24 is directly connected, GigabitEthernet4
L        192.168.35.5/32 is directly connected, GigabitEthernet4
      192.168.45.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.45.0/24 is directly connected, GigabitEthernet5
L        192.168.45.5/32 is directly connected, GigabitEthernet5

And to verify that we are not running any MPLS switching:

CSR-5#sh mpls for
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface

So we have our connected interfaces along with the loopbacks of all the routers in the core network.

Lets take a look at CSR-1’s configuration, with regards to its VRF configuration in particular:

vrf definition CUST-A
 rd 100:100
 !
 address-family ipv4
  route-target export 100:100
  route-target import 100:100
 exit-address-family
!
!
interface GigabitEthernet2
 vrf forwarding CUST-A
 ip address 10.0.1.1 255.255.255.0
 negotiation auto
!
router eigrp 1
!
address-family ipv4 vrf CUST-A autonomous-system 100
  redistribute bgp 1 metric 1 1 1 1 1
  network 0.0.0.0
 exit-address-family

We have our VRF CUST-A configured, with a RD of 100:100 along with 100:100 as both import and export Route-Targets. Just as we would configure for a regular MPLS L3 VPN.

We use our GigabithEthernet2 interface as our attachment circuit to our CUST-A. In addition we have EIGRP 100 running as the VRF aware IGP towards R1. And finally we are redistributing BGP into the VRF.

Lets make sure we are receiving routes from R1 into the VRF RIB:

CSR-1#sh ip route vrf CUST-A eigrp | beg Gateway
Gateway of last resort is not set

      100.0.0.0/32 is subnetted, 3 subnets
D        100.100.100.1 [90/130816] via 10.0.1.100, 00:45:37, GigabitEthernet2

Looks good, we are receiving the loopback prefix from R1. This is as we would expect.

A similar configuration exists on CSR-2, CSR-3 and CSR-4. Nothing different from a regular MPLS L3 VPN service.

Now for the core configuration utilizing MP-BGP.
We are using CSR-5 as a VPN-v4 route-reflector in order to avoid having a full mesh of iBGP sessions.

So the configuration on R5 looks like this:

CSR-5#sh run | sec router bgp
router bgp 1
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 1.1.1.1 remote-as 1
 neighbor 1.1.1.1 update-source Loopback0
 neighbor 2.2.2.2 remote-as 1
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 3.3.3.3 remote-as 1
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 4.4.4.4 remote-as 1
 neighbor 4.4.4.4 update-source Loopback0
 !
 address-family ipv4
 exit-address-family
 !
 address-family vpnv4
  neighbor 1.1.1.1 activate
  neighbor 1.1.1.1 send-community extended
  neighbor 1.1.1.1 route-reflector-client
  neighbor 2.2.2.2 activate
  neighbor 2.2.2.2 send-community extended
  neighbor 2.2.2.2 route-reflector-client
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-community extended
  neighbor 3.3.3.3 route-reflector-client
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 send-community extended
  neighbor 4.4.4.4 route-reflector-client
 exit-address-family

Pretty straightforward really.

Then on CSR-1:

 CSR-1#sh run | sec router bgp
 router bgp 1
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
  neighbor 5.5.5.5 remote-as 1
  neighbor 5.5.5.5 update-source Loopback0
  !
  address-family ipv4
  exit-address-family
  !
  address-family vpnv4
   neighbor 5.5.5.5 activate
   neighbor 5.5.5.5 send-community extended
   neighbor 5.5.5.5 route-map MODIFY-INBOUND in
  exit-address-family
  !
  address-family ipv4 vrf CUST-A
   redistribute eigrp 100
  exit-address-family

Here we have a single neighbor configured (R5 being the RR) using our loopback address. We are also redistributing routes from the VRF into BGP for VPNv4 announcements to the other PE’s. Whats really important (and differs from regular MPLS L3 VPN’s) is the route-map we apply inbound (MODIFY-INBOUND). Lets take a closer look at that:

CSR-1#sh route-map
route-map MODIFY-INBOUND, permit, sequence 10
  Match clauses:
  Set clauses:
    ip next-hop encapsulate l3vpn L3VPN-PROFILE
  Policy routing matches: 0 packets, 0 bytes

So all this does is set the next-hop according to a l3vpn profile called L3VPN-PROFILE. Now this is really the heart of the technology. Lets look at the profile in more detail:

CSR-1#sh run | beg L3VPN
l3vpn encapsulation ip L3VPN-PROFILE
 !

Well, that wasnt very informative. It simply defines a standard profile (which means mGRE) with our desired name.
You can get more detail by using the show commands:

CSR-1#sh l3vpn encapsulation ip 

 Profile: L3VPN-PROFILE
  transport ipv4 source Auto: Loopback0
  protocol gre
  payload mpls
   mtu default
  Tunnel Tunnel0 Created [OK]
  Tunnel Linestate [OK]
  Tunnel Transport Source (Auto) Loopback0 [OK]

So this tells us, that by default Loopback0 was chosen as the source of the tunnel and that Tunnel0 was created automatically. So lets take a look at the Tunnel0 in more detail:

 CSR-1#sh interface Tunnel0
 Tunnel0 is up, line protocol is up
   Hardware is Tunnel
   Interface is unnumbered. Using address of Loopback0 (1.1.1.1)
   MTU 9976 bytes, BW 10000 Kbit/sec, DLY 50000 usec,
      reliability 255/255, txload 1/255, rxload 1/255
   Encapsulation TUNNEL, loopback not set
   Keepalive not set
   Tunnel linestate evaluation up
   Tunnel source 1.1.1.1 (Loopback0)
    Tunnel Subblocks:
       src-track:
          Tunnel0 source tracking subblock associated with Loopback0
           Set of tunnels with source Loopback0, 1 member (includes iterators), on interface <OK>
   Tunnel protocol/transport multi-GRE/IP
     Key disabled, sequencing disabled
     Checksumming of packets disabled
   Tunnel TTL 255, Fast tunneling enabled
   Tunnel transport MTU 1476 bytes
   Tunnel transmit bandwidth 8000 (kbps)
   Tunnel receive bandwidth 8000 (kbps)
   Last input never, output never, output hang never
   Last clearing of "show interface" counters 00:54:16
   Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 3
   Queueing strategy: fifo
   Output queue: 0/0 (size/max)
   5 minute input rate 0 bits/sec, 0 packets/sec
   5 minute output rate 0 bits/sec, 0 packets/sec
      0 packets input, 0 bytes, 0 no buffer
      Received 0 broadcasts (0 IP multicasts)
      0 runts, 0 giants, 0 throttles
      0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
      0 packets output, 0 bytes, 0 underruns
      0 output errors, 0 collisions, 0 interface resets
      0 unknown protocol drops
      0 output buffer failures, 0 output buffers swapped out

Whats important here is that the Tunnel protocol/transport is multi-GRE/IP, which is the whole point of it all.

So to recap, when we receive prefixes reflected by our RR (this is besides the point, it could just as well be a full mesh), we set our IP Next-Hop to the other PE’s loopback address and tell the router to do the mGRE encapsulation when traffic is to be routed to these prefixes.

Lets take a look at our BGP table on CSR-1:

CSR-1#sh bgp vpnv4 uni vrf CUST-A | beg Network
     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:100 (default for vrf CUST-A)
 *>  10.0.1.0/24      0.0.0.0                  0         32768 ?
 *>i 10.0.2.0/24      2.2.2.2                  0    100      0 ?
 *>i 10.0.3.0/24      3.3.3.3                  0    100      0 ?
 *>i 10.0.4.0/24      4.4.4.4                  0    100      0 ?
 *>  100.100.100.1/32 10.0.1.100          130816         32768 ?
 *>i 100.100.100.2/32 2.2.2.2             130816    100      0 ?
 *>i 100.100.100.4/32 4.4.4.4             130816    100      0 ?

(**Note: Remember CSR-3 is broken because of VIRL)

Lets take a look at what information is present for 100.100.100.2/32:

 CSR-1#sh bgp vpnv4 uni vrf CUST-A 100.100.100.2/32
 BGP routing table entry for 100:100:100.100.100.2/32, version 19
 Paths: (1 available, best #1, table CUST-A)
   Not advertised to any peer
   Refresh Epoch 1
   Local
     2.2.2.2 (metric 3) (via default) (via Tunnel0) from 5.5.5.5 (5.5.5.5)
       Origin incomplete, metric 130816, localpref 100, valid, internal, best
       Extended Community: RT:100:100 Cost:pre-bestpath:128:130816
         0x8800:32768:0 0x8801:100:128256 0x8802:65281:2560 0x8803:65281:1500
         0x8806:0:1684300802
       Originator: 2.2.2.2, Cluster list: 5.5.5.5
       mpls labels in/out nolabel/17
       rx pathid: 0, tx pathid: 0x0

Important to note here is that we are being told to use label nr. 17 as the VPN label for this prefix when sending it to 2.2.2.2 (CSR-2).

And finally lets take a look at what CEF thinks about it all:

CSR-1#sh ip cef vrf CUST-A 100.100.100.2 detail
100.100.100.2/32, epoch 0, flags [rib defined all labels]
  nexthop 2.2.2.2 Tunnel0 label 17

So CEF will assign label 17 to the packet and then use Tunnel0 to reach CSR-2. Just as we would expect.

As a final verification ive done an Embedded Packet Capture on CSR-5 while doing a ping from R1’s loopback to R2’s loopback and this is what you can see here:

 6  142   29.028990   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800000 0000FF2F B5490101 01010202   ......./.I......
  0020:  02020000 88470001 11FE4500 00640000   .....G....E..d..
  0030:  0000FE01 2BCD6464 64016464 64020800   ....+.ddd.ddd...

   7  142   29.106989   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800001 0000FF2F B5480101 01010202   ......./.H......
  0020:  02020000 88470001 11FE4500 00640001   .....G....E..d..
  0030:  0000FE01 2BCC6464 64016464 64020800   ....+.ddd.ddd...

   8  142   29.184988   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800002 0000FF2F B5470101 01010202   ......./.G......
  0020:  02020000 88470001 11FE4500 00640002   .....G....E..d..
  0030:  0000FE01 2BCB6464 64016464 64020800   ....+.ddd.ddd...

   9  142   29.241037   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800003 0000FF2F B5460101 01010202   ......./.F......
  0020:  02020000 88470001 11FE4500 00640003   .....G....E..d..
  0030:  0000FE01 2BCA6464 64016464 64020800   ....+.ddd.ddd...

  10  142   29.287024   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800004 0000FF2F B5450101 01010202   ......./.E......
  0020:  02020000 88470001 11FE4500 00640004   .....G....E..d..
  0030:  0000FE01 2BC96464 64016464 64020800   ....+.ddd.ddd...

As you can see, the encapsulation is GRE, just as expected.

So thats all there is to this technology. Very useful if you have an IP-only core network.

I hope its been useful and i will soon attach all the configurations for the routers in case you want to take a closer look.

Thanks for reading!

Update: Link to configs here

Unified/Seamless MPLS

In this post I would like to highlight a relative new (to me) application of MPLS called Unified MPLS.
The goal of Unified MPLS is to separate your network into individual segments of IGP’s in order to keep your core network as simple as possible while still maintaining an end-to-end LSP for regular MPLS applications such as L3 VPN’s.

What we are doing is simply to put Route Reflectors into the forwarding path and changing the next-hop’s along the way, essentially stiching together the final LSP.
Along with that we are using BGP to signal a label value to maintain the LSP from one end of the network to the other without the use of LDP between IGP’s.

Take a look at the topology that we will be using to demonstrate this feature:

Unified-MPLS-Topology

In this topology we have a simplified layout of a service provider. We have a core network consisting of R3, R4 and R5 along with distribution networks on the right and left of the core. R2 and R3 is in the left distribution and R5 and R6 is in the right hand side one.

We have an MPLS L3VPN customer connected consisting of R1 in one site and R7 in another.

As is visisible in the topology, we are running 3 separate IGP’s to make a point about this feature. EIGRP AS 1, OSPF 100 and EIGRP AS 2. However we are only running one autonomous system as seen from BGP, so its a pure iBGP network.

Now in order to make the L3VPN to work, we need to have an end-to-end LSP going from R2 all the way to R6.
Whats is key here is that in order to have end-to-end reachability, we have contained IGP areas, each of which is running LDP for labels. However between the areas, all we are doing is leaking a couple of loopback adresses into the distribution sections from the core. These are used exclusively for the iBGP session.

On top of that, we need to have R3 and R5 being route-reflectors, have them being in the data path as well as having them allocating labels. This is done through the “send-label” command along with modifying the next-hop (“next-hop-self all” command).

This is illustrated in the following:

Unified-MPLS-iBGP-Topology

Enough theory, lets take a look at the configuration nessecary to pull this of. Lets start out with R2’s IGP and LDP configuration:

R2#sh run | sec router eigrp
router eigrp 1
 network 2.0.0.0
 network 10.0.0.0
 passive-interface default
 no passive-interface GigabitEthernet3

R2#sh run int g3
interface GigabitEthernet3
 ip address 10.2.3.2 255.255.255.0
 negotiation auto
 mpls ip
end

Pretty vanilla configuration of IGP + LDP.

The same for R3:

R3#sh run | sec router eigrp 1
router eigrp 1
 network 10.0.0.0
 redistribute ospf 100 metric 1 1 1 1 1 route-map REDIST-LOOPBACK-MAP
 passive-interface default
 no passive-interface GigabitEthernet2

R3#sh run int g2
interface GigabitEthernet2
 ip address 10.2.3.3 255.255.255.0
 negotiation auto
 mpls ip
end

R3#sh route-map REDIST-LOOPBACK-MAP
route-map REDIST-LOOPBACK-MAP, permit, sequence 10
  Match clauses:
    ip address prefix-lists: REDIST-LOOPBACK-PREFIX-LIST
  Set clauses:
  Policy routing matches: 0 packets, 0 bytes

R3#sh ip prefix-list
ip prefix-list REDIST-LOOPBACK-PREFIX-LIST: 1 entries
   seq 5 permit 3.3.3.3/32

Apart from the redistribution part, its simply establishing an EIGRP adjacency with R2. On top of that we are redistributing R3’s loopback0 interface, which is in the Core area, into EIGRP. Again, this step is nessecary for the iBGP session establishment.

An almost identical setup is present in the other distribution site, consisting of R5 and R6. Again we redistribute R5’s loopback0 address into the IGP (EIGRP AS 2), so we can have iBGP connectivity, which is our next step.

So lets take a look at the BGP configuration on R2 all the way to R6. Im leaving out the VPNv4 configuration for now, in order to make it more visible what we are trying to accomplish first:

R2:
---
router bgp 1000
 bgp router-id 2.2.2.2
 bgp log-neighbor-changes
 neighbor 3.3.3.3 remote-as 1000
 neighbor 3.3.3.3 update-source Loopback0
 !
 address-family ipv4
  network 2.2.2.2 mask 255.255.255.255
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-label

R3:
---
router bgp 1000
 bgp router-id 3.3.3.3
 bgp log-neighbor-changes
 neighbor 2.2.2.2 remote-as 1000
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 2.2.2.2 route-reflector-client
 neighbor 2.2.2.2 next-hop-self all
 neighbor 2.2.2.2 send-label
 neighbor 5.5.5.5 remote-as 1000
 neighbor 5.5.5.5 update-source Loopback0
 neighbor 5.5.5.5 route-reflector-client
 neighbor 5.5.5.5 next-hop-self all
 neighbor 5.5.5.5 send-label

R5:
---
router bgp 1000
 bgp router-id 5.5.5.5
 bgp log-neighbor-changes
 neighbor 3.3.3.3 remote-as 1000
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 3.3.3.3 route-reflector-client
 neighbor 3.3.3.3 next-hop-self all
 neighbor 3.3.3.3 send-label
 neighbor 6.6.6.6 remote-as 1000
 neighbor 6.6.6.6 update-source Loopback0
 neighbor 6.6.6.6 route-reflector-client
 neighbor 6.6.6.6 next-hop-self all
 neighbor 6.6.6.6 send-label

R6:
---
router bgp 1000
 bgp router-id 6.6.6.6
 bgp log-neighbor-changes
 neighbor 5.5.5.5 remote-as 1000
 neighbor 5.5.5.5 update-source Loopback0
 !
 address-family ipv4
  network 6.6.6.6 mask 255.255.255.255
  neighbor 5.5.5.5 activate
  neighbor 5.5.5.5 send-label

As visible from the configuration. We have 2 IPv4 route-reflectors (R3 and R5), both of which put themselves into the datapath by using the next-hop-self command. On top of that we are allocating labels for all prefixes via BGP as well. Lets verify this on the set:

R2#sh bgp ipv4 uni la
   Network          Next Hop      In label/Out label
   2.2.2.2/32       0.0.0.0         imp-null/nolabel
   6.6.6.6/32       3.3.3.3         nolabel/305

R3#sh bgp ipv4 uni la
   Network          Next Hop      In label/Out label
   2.2.2.2/32       2.2.2.2         300/imp-null
   6.6.6.6/32       5.5.5.5         305/500

R5#sh bgp ipv4 uni la
   Network          Next Hop      In label/Out label
   2.2.2.2/32       3.3.3.3         505/300
   6.6.6.6/32       6.6.6.6         500/imp-null

 R6#sh bgp ipv4 uni la
    Network          Next Hop      In label/Out label
    2.2.2.2/32       5.5.5.5         nolabel/505
    6.6.6.6/32       0.0.0.0         imp-null/nolabel

Since we are only injecting 2 prefixes (loopbacks of R2 and R6) into BGP, thats all we have allocated labels for.

Doing a traceroute from R2 to R6 (between loopbacks), will reveal if we truly have an LSP between them:

R2#traceroute 6.6.6.6 so loo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 10.2.3.3 [MPLS: Label 305 Exp 0] 26 msec 15 msec 18 msec
  2 10.3.4.4 [MPLS: Labels 401/500 Exp 0] 10 msec 24 msec 34 msec
  3 10.4.5.5 [MPLS: Label 500 Exp 0] 7 msec 23 msec 24 msec
  4 10.5.6.6 20 msec *  16 msec

This looks exactly like we wanted it to. (note that the 401 label is on a pure P router in the core).
This also means we can setup our VPNv4 configuration on R2 and R6:

R2#sh run | sec router bgp
router bgp 1000
 bgp router-id 2.2.2.2
 bgp log-neighbor-changes
 neighbor 3.3.3.3 remote-as 1000
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 6.6.6.6 remote-as 1000
 neighbor 6.6.6.6 update-source Loopback0
 !
 address-family ipv4
  network 2.2.2.2 mask 255.255.255.255
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-label
  no neighbor 6.6.6.6 activate
 exit-address-family
 !
 address-family vpnv4
  neighbor 6.6.6.6 activate
  neighbor 6.6.6.6 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf CUSTOMER-A
  redistribute connected
  redistribute static
 exit-address-family
R2#

R6#sh run | sec router bgp
router bgp 1000
 bgp router-id 6.6.6.6
 bgp log-neighbor-changes
 neighbor 2.2.2.2 remote-as 1000
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 5.5.5.5 remote-as 1000
 neighbor 5.5.5.5 update-source Loopback0
 !
 address-family ipv4
  network 6.6.6.6 mask 255.255.255.255
  no neighbor 2.2.2.2 activate
  neighbor 5.5.5.5 activate
  neighbor 5.5.5.5 send-label
 exit-address-family
 !
 address-family vpnv4
  neighbor 2.2.2.2 activate
  neighbor 2.2.2.2 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf CUSTOMER-A
  redistribute connected
  redistribute static
 exit-address-family

Lets verify that the iBGP VPNv4 peering is up and running:

R2#sh bgp vpnv4 uni all sum
..
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
6.6.6.6         4         1000      16      16       11    0    0 00:09:31        2

R6#sh bgp vpnv4 uni all sum
..
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2.2.2.2         4         1000      17      17       11    0    0 00:10:26        2

We do have the prefixes and we should also have reachability from R1 to R7 (by way of their individual static default routes):

R1#ping 7.7.7.7 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 7.7.7.7, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 17/27/54 ms

Looks good, lets check the label path:

R1#traceroute 7.7.7.7 so loo0
Type escape sequence to abort.
Tracing the route to 7.7.7.7
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.2.2 19 msec 13 msec 12 msec
  2 10.2.3.3 [MPLS: Labels 305/600 Exp 0] 18 msec 19 msec 15 msec
  3 10.3.4.4 [MPLS: Labels 401/500/600 Exp 0] 12 msec 32 msec 34 msec
  4 10.4.5.5 [MPLS: Labels 500/600 Exp 0] 20 msec 27 msec 27 msec
  5 10.6.7.6 [MPLS: Label 600 Exp 0] 23 msec 15 msec 13 msec
  6 10.6.7.7 25 msec *  16 msec

What we are seeing here is basically the same path, but with the “VPN” label first (label 600).

So what have we really accomplished here? – Well, lets take a look at the RIB on R2 and look for the IGP (EIGRP AS 1) routes:

R2#sh ip route eigrp
..
      3.0.0.0/32 is subnetted, 1 subnets
D EX     3.3.3.3 [170/2560000512] via 10.2.3.3, 00:16:02, GigabitEthernet3
      10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
D        10.3.4.0/24 [90/3072] via 10.2.3.3, 00:16:02, GigabitEthernet3

A very small table indeed. And if we include whats being learned by BGP:

R2#sh ip route bgp
..
      6.0.0.0/32 is subnetted, 1 subnets
B        6.6.6.6 [200/0] via 3.3.3.3, 00:17:02

R2#sh ip route 6.6.6.6
Routing entry for 6.6.6.6/32
  Known via "bgp 1000", distance 200, metric 0, type internal
  Last update from 3.3.3.3 00:17:43 ago
  Routing Descriptor Blocks:
  * 3.3.3.3, from 3.3.3.3, 00:17:43 ago
      Route metric is 0, traffic share count is 1
      AS Hops 0
      MPLS label: 305

Only 1 prefix to communicate with the remote distribution site’s PE router (which we need the label for).

This means you can scale your distribution sites to very large sizes, keep your core as effecient as possible and eliminate using areas and whatnot in your IGP’s.

I hope its been useful with this quick walkthrough of unified/seamless MPLS.

EIGRP OTP example

In this post id like to provide an example of a fairly new development to EIGRP which is called EIGRP Over The Top (OTP).

In all its simplicity it establish an EIGRP multihop adjacency using LISP as the encapsulation method for transport through the WAN network.

One of the applications of this would be to avoid relying on the SP in an MPLS L3 VPN. You could simply use the L3 VPN for transport between the interfaces directly connected to the Service Provider and run your own adjacency directly between your CPE routers (without the use of a GRE tunnel, which would be another method to do it)

The topology used for this example consists of 4 routers. All 4 of the routers are using OSPF to provide connectivity (you could take this example and do a L3 VPN using MPLS as an exercise). Im simply taking the lazy path and doing it this way 🙂

EIGRP-OTP-Topology

EIGRP-OTP-Topology

R1 and R4 are running EIGRP in a named process “test”. This process is in Autonomous system 100 and the Loopback 0 interfaces are advertised into the V4 address-family.

Lets verify that we have connectivity between R1’s g1.102 interface and R4’s g1.304 interface:

R1#ping 172.3.4.4 so g1.102
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.3.4.4, timeout is 2 seconds:
Packet sent with a source address of 172.1.2.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/5/19 ms

All looks good.

Now lets take a look at the configuration that ties the R1 and R4 together with an EIGRP adjacency:

on R1:

R1#sh run | sec router eigrp
router eigrp test
 !
 address-family ipv4 unicast autonomous-system 100
  !
  topology base
  exit-af-topology
  neighbor 172.3.4.4 GigabitEthernet1.102 remote 10 lisp-encap 1
  network 1.0.0.0
  network 172.1.0.0
 exit-address-family

Whats important here is line 8 and 10.

In line 8, we specify that we have a remote neighbor (172.3.4.4), which can be reached through g1.102. The maximum number of hops to reach this neighbor is 10 and we should use lisp encapsulation with an ID of 1.

Also in line 10, its important to add the outgoing interface into the EIGRP process. I’ve found that without doing this, the adjacency wont come up. Its not enough to specify the interface in the neighbor command.

Lets verify which interfaces we are running EIGRP on at R1:

R1#sh ip eigrp interfaces
EIGRP-IPv4 VR(test) Address-Family Interfaces for AS(100)
                              Xmit Queue   PeerQ        Mean   Pacing Time   Multicast    Pending
Interface              Peers  Un/Reliable  Un/Reliable  SRTT   Un/Reliable   Flow Timer   Routes
Lo0                      0        0/0       0/0           0       0/0            0           0
Gi1.102                  1        0/0       0/0           1       0/0           50           0

On the reverse path, on R4:

R4#sh run | sec router eigrp
router eigrp test
 !
 address-family ipv4 unicast autonomous-system 100
  !
  topology base
  exit-af-topology
  neighbor 172.1.2.1 GigabitEthernet1.304 remote 10 lisp-encap 1
  network 4.0.0.0
  network 172.3.0.0
 exit-address-family

Same deal, just in the opposite direction.

Thats about it, lets take a look if we have the desired adjacency up and running:

R1#sh ip ei nei
EIGRP-IPv4 VR(test) Address-Family Neighbors for AS(100)
H   Address                 Interface              Hold Uptime   SRTT   RTO  Q  Seq
                                                   (sec)         (ms)       Cnt Num
0   172.3.4.4               Gi1.102                  12 01:14:16    1   100  0  3

Excellent! and the routing tables:

R1#sh ip route eigrp | beg Gateway
Gateway of last resort is not set

      4.0.0.0/32 is subnetted, 1 subnets
D        4.4.4.4 [90/93994331] via 172.3.4.4, 01:14:50, LISP1

Pay attention to the fact that LISP1 is used as the outgoing interface.

And finally the data plane verification:

R1#ping 4.4.4.4 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 4.4.4.4, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/10/25 ms

Great! – and thats about all there is to a simple EIGRP OTP scenario. (Look into EIGRP “Route Reflectors” if you want more information on hub-and-spoke topologies).

Take care!

Trying out IPv6 Prefix Delegation

In this post i will show how and why to use a feature called IPv6 Prefix Delegation (PD).

IPv6 prefix delegation is a feature that provides the capability to delegate or hand out IPv6 prefixes to other routers without the need to hardcode these prefixes into the routers.

Why would you want to do this? – Well, for one is the administration overhead associated with manual configuration. If the end-customer only cares about the amount of prefixes he or she receives, then it might as well be handed out automatically from a preconfigure pool. Just like DHCP works today on end-user systems.

On top of that, by configuring a redistribution into BGP just once, you will automatically have reachability to the prefixes that has been handed out, from the rest of your SP network.

So how do you go about configuring this? – Well, lets take a look at the topology we’ll be using to demonstrate IPv6 Prefix Delegation.

PD-Post-Topology

First off, we have the SP core network which consists of R1, R2 and R3. They are running in AS 64512 with R1 being a BGP route-reflector for the IPv6 unicast address-family. As an IGP we are running OSPFv3 to provide reachability within the core. No IPv4 is configured on any device.

The SP has been allocated a /32 IPv6 prefix which is 2001:1111::/32, from which it will “carve” out IPv6 prefixes to both its internal network as well as customer networks.

We are using /125 for the links between the core routers, just to make it simple when looking at the routing tables and the topology.

R2 is really where all the magic is taking place. R2 is a PE for two customers, Customer A and Customer B. Customer A is being reached through Gigabit2 and Customer B through Gigabit3. The customer’s respective CE routers are R4 and R7.

There is a link-net between R2 and R4 as well as R2 and R7. These are respectively 2001:1111:101::/64 and 2001:1111:102::/64.

So Lab-ISP has decided to use a /48 network from which to hand out prefixes to its customers. This /48 is 2001:1111:2222::/48. Lab-ISP also decided to hand out /56 addresses which will give the customers 8 bits (from 56 to 64) to use for subnetting. This is a typical deployment.

Also, since we are using a /48 as the block to “carve” out from, this gives us 8 bits (from 48 to 56) of assignable subnets, which ofcourse equals to 256 /56 prefixes we can hand out.

All of this can be a bit confusing, so lets look at it from a different perspective.

We start out with 2001:1111:2222::/48. We then want to look at how the first /56 looks like:

The 2001:1111:2222:0000::/56 is
2001:1111:2222:0000::
until
2001:1111:2222:00FF::

That last byte (remember this is all in hex) is what gives the customer 256 subnets to play around with.

The next /56 is:
2001:1111:2222:0100::/56

2001:1111:2222:0100::
until
2001:1111:2222:01FF::

We can do this all in all 256 times as mentioned earlier.

So in summary, with two customers, each receiving a /56 prefix, we would expect to see the bindings show up on R2 as:

2001:1111:2222::/56
2001:1111:2222:100::/56

So with all this theory in place, lets take a look at the configuration that makes all this work out.

First off we start out with creating a local IPv6 pool on R2:

ipv6 local pool IPv6-Local-Pool 2001:1111:2222::/48 56

This is in accordance to the requirements we have stated earlier.

Next up, we tie this local pool into a global IPv6 pool used specifically for Prefix Delegation:

ipv6 dhcp pool PD-DHCP-POOL
 prefix-delegation pool IPv6-Local-Pool

Finally we attach the IPv6 DHCP pool to the interfaces of Customer A and Customer B:

R2#sh run int g2
Building configuration...

Current configuration : 132 bytes
!
interface GigabitEthernet2
 no ip address
 negotiation auto
 ipv6 address 2001:1111:101::2/64
 ipv6 dhcp server PD-DHCP-POOL
end

R2#sh run int g3
Building configuration...

Current configuration : 132 bytes
!
interface GigabitEthernet3
 no ip address
 negotiation auto
 ipv6 address 2001:1111:102::2/64
 ipv6 dhcp server PD-DHCP-POOL
end

Thats pretty much all thats required from the SP point of view in order to hand out the prefixes.

Now, lets take a look at whats required on the CE routers.

Starting off with R4’s interface to the SP:

R4#sh run int g2
Building configuration...

Current configuration : 156 bytes
!
interface GigabitEthernet2
 no ip address
 negotiation auto
 ipv6 address 2001:1111:101::3/64
 ipv6 address autoconfig
 ipv6 dhcp client pd LOCAL-CE
end

Note that the “LOCAL-CE” is a local label we will use for the next step. It can be anything you desire.

Only when the “inside” interfaces requests an IPv6 address will a request be sent to the SP for them to hand something out. This is done on R4’s g1.405 and g1.406 interfaces:

R4#sh run int g1.405
Building configuration...

Current configuration : 126 bytes
!
interface GigabitEthernet1.405
 encapsulation dot1Q 405
 ipv6 address LOCAL-CE ::1:0:0:0:1/64
 ipv6 address autoconfig
end

R4#sh run int g1.406
Building configuration...

Current configuration : 126 bytes
!
interface GigabitEthernet1.406
 encapsulation dot1Q 406
 ipv6 address LOCAL-CE ::2:0:0:0:1/64
 ipv6 address autoconfig
end

Here we reference the previous local label “LOCAL-CE”. Most interesting is the fact that we are now subnetting the /56 prefix we have received by doing the “::1:0:0:0:1/64” and “::2:0:0:0:1/64” respectively.

What this does is that it appends the address to whats being given out. To repeat, for Customer A, this is 2001:1111:2222::/56 which will then be a final address of: 2001:1111:2222:1:0:0:0:1/64 for interface g1.405 and 2001:1111:2222:2:0:0:0:1/64 for g1.406.

Lets turn our attention to Customer B on R7.

Same thing has been configured, just using a different “label” for the assigned pool to show that its arbitrary:

R7#sh run int g3
Building configuration...

Current configuration : 155 bytes
!
interface GigabitEthernet3
 no ip address
 negotiation auto
 ipv6 address 2001:1111:102::7/64
 ipv6 address autoconfig
 ipv6 dhcp client pd CE-POOL
end

And the inside interface g1.100:

R7#sh run int g1.100
Building configuration...

Current configuration : 100 bytes
!
interface GigabitEthernet1.100
 encapsulation dot1Q 100
 ipv6 address CE-POOL ::1:0:0:0:7/64
end

Again, we are subnetting the received /56 into a /64 and applying it on the inside interface.

Going back to the SP point of view, lets verify that we are handing out some prefixes:


R2#sh ipv6 local pool
Pool                  Prefix                                       Free  In use
IPv6-Local-Pool       2001:1111:2222::/48                            254      2

We can see that our local pool has handed out 2 prefixes and if we dig further down into the bindings:


R2#sh ipv6 dhcp binding
Client: FE80::250:56FF:FEBE:93CC
  DUID: 00030001001EF6767600
  Username : unassigned
  VRF : default
  Interface : GigabitEthernet3
  IA PD: IA ID 0x00080001, T1 302400, T2 483840
    Prefix: 2001:1111:2222:100::/56
            preferred lifetime 604800, valid lifetime 2592000
            expires at Oct 16 2014 03:11 PM (2416581 seconds)
Client: FE80::250:56FF:FEBE:4754
  DUID: 00030001001EE5DF8700
  Username : unassigned
  VRF : default
  Interface : GigabitEthernet2
  IA PD: IA ID 0x00070001, T1 302400, T2 483840
    Prefix: 2001:1111:2222::/56
            preferred lifetime 604800, valid lifetime 2592000
            expires at Oct 16 2014 03:11 PM (2416575 seconds)

We see that we do indeed have some bindings taking place. Whats of more interest though, is the fact that static routes have been created:


R2#sh ipv6 route static | beg a - Ap
       a - Application
S   2001:1111:2222::/56 [1/0]
     via FE80::250:56FF:FEBE:4754, GigabitEthernet2
S   2001:1111:2222:100::/56 [1/0]
     via FE80::250:56FF:FEBE:93CC, GigabitEthernet3

So two static routes that points to the CE routers. This makes it extremely simple to propagate further into the SP core:


R2#sh run | sec router bgp
router bgp 64512
 bgp router-id 2.2.2.2
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 2001:1111::12:1 remote-as 64512
 !
 address-family ipv4
 exit-address-family
 !
 address-family ipv6
  redistribute static
  neighbor 2001:1111::12:1 activate
 exit-address-family

Ofcourse some sort of filtering should be used instead of just redistributing every static route on the PE, but you get the point. So lets check it out on R3 for example:


R3#sh bgp ipv6 uni | beg Network
     Network          Next Hop            Metric LocPrf Weight Path
 *>i 2001:1111:2222::/56
                       2001:1111::12:2          0    100      0 ?
 *>i 2001:1111:2222:100::/56
                       2001:1111::12:2          0    100      0 ?

We do indeed have the two routes installed.

So how could the customer setup their routers to learn these prefixes automatically and use them actively?
Well, one solution would be stateless autoconfiguration, which i have opted to use here along with setting the default route doing this, on R5:


R5#sh run int g1.405
Building configuration...

Current configuration : 96 bytes
!
interface GigabitEthernet1.405
 encapsulation dot1Q 405
 ipv6 address autoconfig default
end

R5#sh ipv6 route | beg a - Ap
       a - Application
ND  ::/0 [2/0]
     via FE80::250:56FF:FEBE:49F3, GigabitEthernet1.405
NDp 2001:1111:2222:1::/64 [2/0]
     via GigabitEthernet1.405, directly connected
L   2001:1111:2222:1:250:56FF:FEBE:3DFB/128 [0/0]
     via GigabitEthernet1.405, receive
L   FF00::/8 [0/0]
     via Null0, receive

and R6:


R6#sh run int g1.406
Building configuration...

Current configuration : 96 bytes
!
interface GigabitEthernet1.406
 encapsulation dot1Q 406
 ipv6 address autoconfig default
end

R6#sh ipv6 route | beg a - App
       a - Application
ND  ::/0 [2/0]
     via FE80::250:56FF:FEBE:49F3, GigabitEthernet1.406
NDp 2001:1111:2222:2::/64 [2/0]
     via GigabitEthernet1.406, directly connected
L   2001:1111:2222:2:250:56FF:FEBE:D054/128 [0/0]
     via GigabitEthernet1.406, receive
L   FF00::/8 [0/0]
     via Null0, receive

So now we have the SP core in place, we have the internal customer in place. All thats really required now is for some sort of routing to take place on the CE routers toward the SP. I have chosen the simplest solution, a static default route:


R4#sh run | incl ipv6 route
ipv6 route ::/0 2001:1111:101::2

and on R7:


R7#sh run | incl ipv6 route
ipv6 route ::/0 2001:1111:102::2

Finally its time to test all this stuff out in the data plane.

Lets ping from R3 to R5 and R6:


R3#ping 2001:1111:2222:1:250:56FF:FEBE:3DFB
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:1:250:56FF:FEBE:3DFB, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/12/20 ms
R3#ping 2001:1111:2222:2:250:56FF:FEBE:D054
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:2:250:56FF:FEBE:D054, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/7/17 ms

And also to R7:


R3#ping 2001:1111:2222:101::7
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:101::7, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/8/18 ms

Excellent. Everything works.

Lets summarize what we have done.

1) We created a local IPv6 pool on the PE router.
2) We created a DHCPv6 server utilizing this local pool as a prefix delegation.
3) We enabled the DHCPv6 server on the customer facing interfaces.
4) We enabled the DHCPv6 PD on the CE routers (R4 and R7) and used a local label as an identifier.
5) We enabled IPv6 addresses using PD on the local interfaces toward R5, R6 on Customer A and on R7 on Customer B.
6) We used stateless autoconfiguration internal to the customers to further propagate the IPv6 prefixes.
7) We created static routing on the CE routers toward the SP.
8) We redistributed statics into BGP on the PE router.
9) We verified that IPv6 prefixes were being delegated through DHCPv6.
10) And finally we verified that everything was working in the data plane.

I hope this has covered a pretty niche topic of IPv6 and it has been useful to you.

Take care!

VRF based path selection

In this post I will be showing you how its possible to use different paths between your PE routers on a per VRF basis.

This is very useful if you have customers you want to “steer” away from your normal traffic flow between PE routers.
For example, this could be due to certain SLA’s.

I will be using the following topology to demonstrate how this can be done:

Topology

A short walkthrough of the topology is in order.

In the service provider core we have 4 routers. R3, XRv-1, XRv-2 and R4. R3 and R4 are IOS-XE based routers and XRv-1 and XRv-2 are as the name implies, IOS-XR routers. There is no significance attached to the fact that im running two XR routers. Its simply how I could build the required topology.

The service provider is running OSPF as the IGP, with R3 and R4 being the PE routers for an MPLS L3 VPN service. On top of that, LDP is being used to build the required LSP’s. The IGP has been modified to prefer the northbound path (R3 -> XRv-1 -> R4) by increasing the cost of the R3, XRv-2 and R4 to 100.

So by default, traffic between R3 and R4 will flow northbound.

We can easily verify this:

R3#traceroute 4.4.4.4
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.3.10.10 [MPLS: Label 16005 Exp 0] 16 msec 1 msec 1 msec
  2 10.4.10.4 1 msec *  5 msec

And the reverse path is the same:

R4#traceroute 3.3.3.3
Type escape sequence to abort.
Tracing the route to 3.3.3.3
VRF info: (vrf in name/id, vrf out name/id)
  1 10.4.10.10 [MPLS: Label 16000 Exp 0] 3 msec 2 msec 0 msec
  2 10.3.10.3 1 msec *  5 msec

Besides that traffic flow the desired way, we can see we are using label switching between the loopbacks. Exactly what we want in this type of setup.

On the customer side, we have 2 customers, Customer A and Customer B. Each of them has 2 sites, one behind R3 and one behind R4. Pretty simple. They are all running EIGRP between the CE’s and the PE’s.

Beyond this we have MPLS Traffic Engineering running in the service core as well. Specifically we are running a tunnel going from R3’s loopback200 (33.33.33.33/32) towards R4’s loopback200 (44.44.44.44/32). This has been accomplished by configuring an explicit path on both R3 and R4.

Lets verify the tunnel configuration on both:

On R3:

R3#sh ip expl
PATH NEW-R3-TO-R4 (strict source route, path complete, generation 8)
    1: next-address 10.3.20.20
    2: next-address 10.4.20.4
R3#sh run int tunnel10
Building configuration...

Current configuration : 180 bytes
!
interface Tunnel10
 ip unnumbered Loopback200
 tunnel mode mpls traffic-eng
 tunnel destination 10.4.20.4
 tunnel mpls traffic-eng path-option 10 explicit name NEW-R3-TO-R4
end

And on R4:

R4#sh ip expl
PATH NEW-R4-TO-R3 (strict source route, path complete, generation 4)
    1: next-address 10.4.20.20
    2: next-address 10.3.20.3
R4#sh run int tun10
Building configuration...

Current configuration : 180 bytes
!
interface Tunnel10
 ip unnumbered Loopback200
 tunnel mode mpls traffic-eng
 tunnel destination 10.3.20.3
 tunnel mpls traffic-eng path-option 10 explicit name NEW-R4-TO-R3
end

On top of that we have configured a static route on both R3 and R4, to steer traffic for each others loopback200’s down the tunnel:

R3#sh run | incl ip route
ip route 44.44.44.44 255.255.255.255 Tunnel10

R4#sh run | incl ip route
ip route 33.33.33.33 255.255.255.255 Tunnel10

Resulting in the following RIB’s:

R3#sh ip route 44.44.44.44
Routing entry for 44.44.44.44/32
  Known via "static", distance 1, metric 0 (connected)
  Routing Descriptor Blocks:
  * directly connected, via Tunnel10
      Route metric is 0, traffic share count is 1
	  
R4#sh ip route 33.33.33.33
Routing entry for 33.33.33.33/32
  Known via "static", distance 1, metric 0 (connected)
  Routing Descriptor Blocks:
  * directly connected, via Tunnel10
      Route metric is 0, traffic share count is 1

And to test out that we are actually using the southbound path (R3 -> XRv-2 -> R4), lets traceroute between the loopbacks (loopback200):

on R3:

R3#traceroute 44.44.44.44 so loopback200
Type escape sequence to abort.
Tracing the route to 44.44.44.44
VRF info: (vrf in name/id, vrf out name/id)
  1 10.3.20.20 [MPLS: Label 16007 Exp 0] 4 msec 2 msec 1 msec
  2 10.4.20.4 1 msec *  3 msec

and on R4:

R4#traceroute 33.33.33.33 so loopback200
Type escape sequence to abort.
Tracing the route to 33.33.33.33
VRF info: (vrf in name/id, vrf out name/id)
  1 10.4.20.20 [MPLS: Label 16008 Exp 0] 4 msec 1 msec 1 msec
  2 10.3.20.3 1 msec *  3 msec

This verifies that we have our two unidirectional tunnels and that communication between the loopback200 interfaces flows through the southbound path using our TE tunnels.

So lets take a look at the very simple BGP PE configuration on both R3 and R4:

R3:

router bgp 100
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 4.4.4.4 remote-as 100
 neighbor 4.4.4.4 update-source Loopback100
 !
 address-family ipv4
 exit-address-family
 !
 address-family vpnv4
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf A
  redistribute eigrp 100
 exit-address-family
 !
 address-family ipv4 vrf B
  redistribute eigrp 100
 exit-address-family

and R4:

router bgp 100
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 3.3.3.3 remote-as 100
 neighbor 3.3.3.3 update-source Loopback100
 !
 address-family ipv4
 exit-address-family
 !
 address-family vpnv4
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf A
  redistribute eigrp 100
 exit-address-family
 !
 address-family ipv4 vrf B
  redistribute eigrp 100
 exit-address-family

From this output, we can see that we are using the loopback100 interfaces for the BGP peering. As routing updates comes in from one PE, the next-hop will be set to the remote PE’s loopback100 interface. This will then cause the transport-label to be one going to this loopback100 interface.

A traceroute from R1’s loopback0 interface to R5’s loopback0 interface, will show us the path that traffic between each site in VRF A (Customer A) will take:

R1:

R1#traceroute 5.5.5.5 so loo0
Type escape sequence to abort.
Tracing the route to 5.5.5.5
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.3.3 1 msec 1 msec 0 msec
  2 10.3.10.10 [MPLS: Labels 16005/408 Exp 0] 6 msec 1 msec 10 msec
  3 10.4.5.4 [MPLS: Label 408 Exp 0] 15 msec 22 msec 17 msec
  4 10.4.5.5 18 msec *  4 msec

and lets compare that to what R3 will use as the transport label to reach R4’s loopback100 interface:

 
R3#sh mpls for
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
300        Pop Label  10.10.10.10/32   0             Gi1.310    10.3.10.10
301        Pop Label  10.4.10.0/24     0             Gi1.310    10.3.10.10
302        Pop Label  20.20.20.20/32   0             Gi1.320    10.3.20.20
303        16004      10.4.20.0/24     0             Gi1.310    10.3.10.10
304   [T]  Pop Label  44.44.44.44/32   0             Tu10       point2point
305        16005      4.4.4.4/32       0             Gi1.310    10.3.10.10
310        No Label   1.1.1.1/32[V]    2552          Gi1.13     10.1.3.1
311        No Label   10.1.3.0/24[V]   0             aggregate/A
312        No Label   2.2.2.2/32[V]    2552          Gi1.23     10.2.3.2
313        No Label   10.2.3.0/24[V]   0             aggregate/B

We can see that this matches up being 16005 (going to XRv-1) through the northbound path.

This begs the question, how do we steer our traffic through the southbound path using the loopback200 instead, when the peering is between loopback100’s?

Well, thankfully IOS has it covered. Under the VRF configuration for Customer B (VRF B), we have the option of setting the loopback interface of updates sent to the remote PE:

On R3:

vrf definition B
 rd 100:2
 !
 address-family ipv4
  route-target export 100:2
  route-target import 100:2
  bgp next-hop Loopback200
 exit-address-family

and the same on R4:

 vrf definition B
  rd 100:2
  !
  address-family ipv4
   route-target export 100:2
   route-target import 100:2
   bgp next-hop Loopback200
  exit-address-family

This causes the BGP updates to contain the “correct” next-hop:

R3:

R3#sh bgp vpnv4 uni vrf B | beg Route Dis
Route Distinguisher: 100:2 (default for vrf B)
 *>  2.2.2.2/32       10.2.3.2            130816         32768 ?
 *>i 6.6.6.6/32       44.44.44.44         130816    100      0 ?
 *>  10.2.3.0/24      0.0.0.0                  0         32768 ?
 *>i 10.4.6.0/24      44.44.44.44              0    100      0 ?

44.44.44.44/32 being the loopback200 of R4, and on R4:

R4#sh bgp vpnv4 uni vrf B | beg Route Dis
Route Distinguisher: 100:2 (default for vrf B)
 *>i 2.2.2.2/32       33.33.33.33         130816    100      0 ?
 *>  6.6.6.6/32       10.4.6.6            130816         32768 ?
 *>i 10.2.3.0/24      33.33.33.33              0    100      0 ?
 *>  10.4.6.0/24      0.0.0.0                  0         32768 ?

Lets check out whether this actually works or not:

R2#traceroute 6.6.6.6 so loo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 10.2.3.3 1 msec 1 msec 0 msec
  2 10.3.20.20 [MPLS: Labels 16007/409 Exp 0] 4 msec 1 msec 10 msec
  3 10.4.6.4 [MPLS: Label 409 Exp 0] 15 msec 16 msec 17 msec
  4 10.4.6.6 19 msec *  4 msec

Excellent! – We can see that we are indeed using the southbound path. To make sure we are using the tunnel, note the transport label of 16007, and compare that to:

R3:

R3#sh mpls traffic-eng tun tunnel 10

Name: R3_t10                              (Tunnel10) Destination: 10.4.20.4
  Status:
    Admin: up         Oper: up     Path: valid       Signalling: connected
    path option 10, type explicit NEW-R3-TO-R4 (Basis for Setup, path weight 200)

  Config Parameters:
    Bandwidth: 0        kbps (Global)  Priority: 7  7   Affinity: 0x0/0xFFFF
    Metric Type: TE (default)
    AutoRoute: disabled LockDown: disabled Loadshare: 0 [0] bw-based
    auto-bw: disabled
  Active Path Option Parameters:
    State: explicit path option 10 is active
    BandwidthOverride: disabled  LockDown: disabled  Verbatim: disabled


  InLabel  :  -
  OutLabel : GigabitEthernet1.320, 16007
  Next Hop : 10.3.20.20

I have deleted alot of non-relevant output, but pay attention to the Outlabel, which is indeed 16007.

So that was a quick walkthrough of how easy it is to accomplish the stated goal once you know about that nifty IOS command.

I hope its been useful to you.

Take Care!