Category Archives: CCIE

VxLAN on the CSR1Kv

By now, VxLAN is becoming the standard way of tunneling in the Datacenter.
Using VxLAN, i will show how to use the CSR1Kv to extend your Datacenter L2 reach between sites as well.

First off, what is VxLAN?
It stands for Virtual Extensible LAN. Basically you have a way of decoupling your vlan’s into a new scheme.

You basically map your VLAN into a VNI (Virtual Network Identifier), which in essence makes your VLAN numbering scheme locally significant.

Also, since the numbering for VNI’s is a 24 bit identifier, you have alot more flexibility than just the regular 4096 definable VLAN’s. (12 Bits .1q tags)

Each endpoint that does the encapsulation/decapsulation is called a VTEP (VxLAN Tunnel EndPoint). In our example this would be CSR3 and CSR5.

After the VxLAN header, the packet is further encapsulated into a UDP packet and forwarded across the network. This is a great solution as it doesnt impose any technical restrictions on the core of the network. Only the VTEPs needs to understand VxLAN (and probably have hardware acceleration for it as well).

Since we wont be using BGP EVPN, we will rely solely on multicasting in the network to establish who is the VTEP’s for the traffic in question. The only supported mode is BiDir mode, which is an optimization of the control plane (not the data plane), since it only has (*,G) in its multicast-routing tables.

Lets take a look at the topology i will be using for the example:

 

I have used a regular IOS based device in Site 1 and Site 2, to represent our L2 devices. These could be servers or end-clients for that matter. What i want to accomplish is to run EIGRP between R1 and R2 over the “fabric” using VxLAN as the tunneling mechanism.

CSR3 is the VTEP for Site 1 and CSR5 is the VTEP for Site 2.

In the “fabric” we have CSR4, along with its loopback0 (4.4.4.4/32), which is the BiDir RP and its announcing this using BSR so that CSR3 and CSR4 knows this RP information (along with the BiDir functionality). We are using OSPF as the IGP in the “fabric” to establish routing between the loopback interfaces, which will be the VTEP’s respectively for CSR3 and CSR5.

Lets verify that routing between the loopbacks are working and our RIB is correct:

on CSR3:

CSR3#sh ip route | beg Gate
Gateway of last resort is not set

      3.0.0.0/32 is subnetted, 1 subnets
C        3.3.3.3 is directly connected, Loopback0
      4.0.0.0/32 is subnetted, 1 subnets
O        4.4.4.4 [110/2] via 10.3.4.4, 00:38:27, GigabitEthernet2
      5.0.0.0/32 is subnetted, 1 subnets
O        5.5.5.5 [110/3] via 10.3.4.4, 00:38:27, GigabitEthernet2
      10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
C        10.3.4.0/24 is directly connected, GigabitEthernet2
L        10.3.4.3/32 is directly connected, GigabitEthernet2
O        10.4.5.0/24 [110/2] via 10.3.4.4, 00:38:27, GigabitEthernet2

CSR3#ping 5.5.5.5 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 5.5.5.5, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.3 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/7/22 ms

This means we have full reachability through the “fabric” from VTEP to VTEP.

Lets make sure our multicast routing is working properly and lets take a look at CSR4 first, since its the RP for the network:

CSR4#sh run | incl ip pim|interface
interface Loopback0
 ip pim sparse-mode
interface GigabitEthernet1
 ip pim sparse-mode
interface GigabitEthernet2
 ip pim sparse-mode
interface GigabitEthernet3
interface GigabitEthernet4
ip pim bidir-enable
ip pim bsr-candidate Loopback0 0
ip pim rp-candidate Loopback0 bidir

We can see from this output that we are running PIM on all the relevant interfaces as well as making sure that bidir is enabled. We have also verified that we are indeed running BSR to announce Loopback0 as the RP.

Lets verify the multicast routing table:

CSR4#sh ip mroute | beg Outgoing   
Outgoing interface flags: H - Hardware switched, A - Assert winner, p - PIM Join
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*,224.0.0.0/4), 00:45:05/-, RP 4.4.4.4, flags: B
  Bidir-Upstream: Loopback0, RPF nbr: 4.4.4.4
  Incoming interface list:
    GigabitEthernet2, Accepting/Sparse
    GigabitEthernet1, Accepting/Sparse
    Loopback0, Accepting/Sparse

(*, 239.1.1.1), 00:44:03/00:02:46, RP 4.4.4.4, flags: B
  Bidir-Upstream: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet1, Forward/Sparse, 00:44:03/00:02:38
    GigabitEthernet2, Forward/Sparse, 00:44:03/00:02:46

(*, 224.0.1.40), 00:45:05/00:01:56, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Loopback0, Forward/Sparse, 00:45:04/00:01:56

We can see that we do have some (*,G) entries installed (more on the (*, 239.1.1.1) later).
Excellent.

Now lets take a sample from CSR3’s multicast configuration:

CSR3#sh ip pim rp mapping 
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
  RP 4.4.4.4 (?), v2, bidir
    Info source: 4.4.4.4 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:45:39, expires: 00:02:23

We see that we have learned the RP, its functionality as BiDir and its learned through BSR.

So far so good. Now lets turn our attention to the VxLAN part of the configuration.

The VTEP functionality is implemented by a new interface, called an NVE. This is where the configuration of which source address to use along with the multicast group to use for flooding is defined.

This is the configuration for CSR3:

CSR3#sh run int nve1
Building configuration...

Current configuration : 137 bytes
!
interface nve1
 no ip address
 source-interface Loopback0
 member vni 1000100 mcast-group 239.1.1.1
 no mop enabled
 no mop sysid
end

Whats important here is that we will source our VTEP from loopback0 (3.3.3.3/32) and use multicast group 239.1.1.1 for the VNI 1000100. This number can be whatever you choose, i have just chosen to use a very large number and encode which VLAN this VNI is used for (Vlan 100).

On the opposite side, we have a similar configuration for the NVE:

CSR5#sh run int nve1
Building configuration...

Current configuration : 137 bytes
!
interface nve1
 no ip address
 source-interface Loopback0
 member vni 1000100 mcast-group 239.1.1.1
 no mop enabled
 no mop sysid
end

Its very important that the multicast group matches on both sides as this is the group they will use to flood BUM (Broadcasts, Unknowns and Multicast) traffic. For example ARP.

The next configuration piece is that we need to create an EFP (Ethernet Flow Point) on the interface towards the site routers (R1 and R2) where we accept traffic tagged with vlan 100:

CSR3#sh run int g1
Building configuration...

Current configuration : 195 bytes
!
interface GigabitEthernet1
 no ip address
 negotiation auto
 no mop enabled
 no mop sysid
 service instance 100 ethernet
  encapsulation dot1q 100
  rewrite ingress tag pop 1 symmetric
 !
end

This configuration piece states that the encap is dot1q vlan 100 and to strip the tag inbound before further processing and add it again on egress.

Now for the piece that ties it all together, namely the bridge-domain:

bridge-domain 100 
 member vni 1000100
 member GigabitEthernet1 service-instance 100

Here we have a bridge domain configuration where we have 2 members. The local interface G1 on its service instance 100 and our VNI / VTEP. This is basically the glue to tie the bridge domain together end to end.

The same configuration is present on CSR5 as well.

Let verify the control plane on CSR3:

CSR3#sh bridge-domain 100
Bridge-domain 100 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet1 service instance 100
    vni 1000100
   AED MAC address    Policy  Tag       Age  Pseudoport
   0   AABB.CC00.1000 forward dynamic   298  GigabitEthernet1.EFP100
   0   AABB.CC00.2000 forward dynamic   300  nve1.VNI1000100, VxLAN 
                                             src: 3.3.3.3 dst: 5.5.5.5

This command will show the MAC addresses learned in this particular bridge domain. On our EFP on G1 we have dynamically learned the MAC address of R1’s interface and through the NVE1 interface using VNI 1000100 we have learned the MAC address of R2. Pay attention to the fact that we know which VTEP endpoints to send the traffic to now. This means that further communication between these two end-hosts (R1 and R2) is done solely using unicast between 3.3.3.3 and 5.5.5.5 using VxLAN as the tunneling mechanism.

CSR3#show nve interface nve 1 detail 
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:3.3.3.3 vrf:0)
   Pkts In   Bytes In   Pkts Out  Bytes Out
      3273     268627       3278     269026

This command shows the status of our NVE interface. From this we can see that its in an Up/Up state. The VxLAN port is the standard destination port (4789) and we have some packets going back and forth.

Now that we have everything checked out okay in the control plane, lets see if the data plane is working by issuing an ICMP ping on R1 to R2 (they are obviously on the same subnet (192.168.100.0/24)):

R1#ping 192.168.100.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.100.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/11/26 ms
R1#sh arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.100.1           -   aabb.cc00.1000  ARPA   Ethernet0/0.100
Internet  192.168.100.2           8   aabb.cc00.2000  ARPA   Ethernet0/0.100

This looks excellent! and in fact the EIGRP peering i had setup between them works as well:

R1#sh ip eigrp neighbors 
EIGRP-IPv4 Neighbors for AS(100)
H   Address                 Interface              Hold Uptime   SRTT   RTO  Q  Seq
                                                   (sec)         (ms)       Cnt Num
0   192.168.100.2           Et0/0.100                12 04:14:30    4   100  0  3

R1#sh ip route eigrp | beg Gateway
Gateway of last resort is not set

      100.0.0.0/32 is subnetted, 2 subnets
D        100.100.100.2 
           [90/409600] via 192.168.100.2, 04:14:46, Ethernet0/0.100

This address is the loopback of R2.

Finally i want to show how the ICMP ping works in the dataplane by doing a capture on CSR4’s G2 interface:

Here we can see a ping i issued on R1’s loopback interface towards R2’s loopback interface.
I have extended the view, so you can see the encapsulation with the VxLAN header running atop the UDP packet.
The UDP packet has the VTEP endpoints (3.3.3.3/32 and 5.5.5.5/32) as the source and destination.

The VNI is what we selected to use and is used for differentiation on the VTEP.
Finally we have our L2 packet in its entirety.

Thats all I wanted to show for now. Next time I will extend this a bit and involve BGP as the control plane.
Thanks for reading!

ISIS Authentication types (packet captures)

In this post i would like to highlight a couple of “features” of ISIS.
More specifically the authentication mechanism used and how it looks in the data plane.

I will do this by configuring a couple of routers and configure the 2 authentication types available. I will then look at packet captures taken from the link between them and illustrate how its used by the ISIS process.

The 2 types of Authentication are link-level authentication of the Hello messages used to establish an adjacency and the second type is the authentication used to authenticate the LSP’s (Link State Packet) themselves.

First off, here is the extremely simple topology, but its all thats required for this purpose:

Simple, right? 2 routers with 1 link between them on Gig1. They are both running ISIS level-2-only mode, which means they will only try and establish a L2 adjacency with their neighbors. Each router has a loopback interface, which is also advertised into ISIS.

First off, lets look at the relevant configuration of CSR-02 for the Link-level authentication:

key chain MY-CHAIN
 key 1
  key-string WIPPIE
!
interface GigabitEthernet1
 ip address 10.1.2.2 255.255.255.0
 ip router isis 1
 negotiation auto
 no mop enabled
 no mop sysid
 isis authentication mode md5
 isis authentication key-chain MY-CHAIN

Without the same configuration on CSR-01, this is what we see in the data path (captured on CSR-02’s G1 interface):

And we also see that we dont have a full adjacency on CSR-01:

CSR-01#sh isis nei

Tag 1:
System Id       Type Interface     IP Address      State Holdtime Circuit Id
CSR-02          L2   Gi1           10.1.2.2        INIT  26       CSR-02.01

Lets apply the same authentication configuration on CSR-01 and see the result:

key chain MY-CHAIN
 key 1
  key-string WIPPIE
!
interface GigabitEthernet1
 ip address 10.1.2.1 255.255.255.0
 ip router isis 1
 negotiation auto
 no mop enabled
 no mop sysid
 isis authentication mode md5
 isis authentication key-chain MY-CHAIN

We now have a full adjacency:

CSR-01#sh isis neighbors 

Tag 1:
System Id       Type Interface     IP Address      State Holdtime Circuit Id
CSR-02          L2   Gi1           10.1.2.2        UP    8        CSR-02.01     

And we have routes from CSR-02:

CSR-01#sh ip route isis | beg Gate
Gateway of last resort is not set

      2.0.0.0/32 is subnetted, 1 subnets
i L2     2.2.2.2 [115/20] via 10.1.2.2, 00:01:07, GigabitEthernet1

Now, this is what we now see from CSR-02’s perspective again:

The Link-level authentication is fairly easy to spot in no time, because you simply wont have a stable adjacency formed.

The second type is LSP authentication. Lets look at the configuration of CSR-02 for this type of authentication:

CSR-02#sh run | sec router isis
 ip router isis 1
 ip router isis 1
router isis 1
 net 49.0000.0000.0002.00
 is-type level-2-only
 authentication mode text
 authentication key-chain MY-CHAIN

In this example, i have selected plain-text authentication, which i certainly dont recommend in production, but its great for our example.

Again, this is what it looks like in the data packet (from CSR-01 to CSR-02) without authentication enabled on CSR-01:

As you can see, we have the LSP that contains CSR-01’s prefixes, but nowhere is authentication present in the packet.

Lets enable it on CSR-01 and see the result:

CSR-01#sh run | sec router isis
 ip router isis 1
 ip router isis 1
router isis 1
 net 49.0000.0000.0001.00
 is-type level-2-only
 authentication mode text
 authentication key-chain MY-CHAIN

The result in the data packet:

Here we clearly have the authentication (with type = 10 (cleartext)) and we can see the password (WIPPIE) because we have selected cleartext.

The result is we a validated ISIS database on both routers.

Thats all folks, hope it helps to understand the difference between the 2 types of authentication in ISIS.

Take care!

 

 

Progress update – 10/07-2017

Hello folks,

Im currently going through the INE DC videos and learning a lot about fabrics and how they work along with a fair bit of UCS information on top of that!

Im spending an average of 2.5 hours on weekdays for study and a bit more in the weekends when time permits.

I still have no firm commitment to the CCIE DC track, but at some point I need to commit to it and really get behind it. One of these days 😉

I mentioned it to the wife-to-be a couple of days ago and while she didn’t applaud the idea, at least she wasn’t firmly against it, which is always something I guess! Its very important for me to have my family behind me in these endeavours!

Im still a bit concerned about the lack of rack rentals for DCv2 from INE, which is something I need to have in place before I order a bootcamp or more training materials from them. As people know by now, I really do my best learning in front of the “system”, trying out what works and what doesn’t.

Now to spin up a few N9K’s in the lab and play around with NX-OS unicast and multicast routing!

Take care.

A look at Auto-Tunnel Mesh Groups

In this post I would like to give a demonstration of using the Auto-Tunnel Mesh group feature.

As you may know, manual MPLS-TE tunnels are first and foremost unidirectional, meaning that if you do them between two PE nodes, you have to do a tunnel in each direction with the local PE node being the headend.

Now imagine if your network had 10 PE routers and you wanted to do a full mesh between them, this can become pretty burdensome and error-prone.
Thankfully there’s a method to avoid doing this manual configuration and instead rely on your IGP to signal its willingness to become part of a TE “Mesh”. Thats what the Auto-Tunnel Mesh Group feature is all about!

toplogy

In my small SP setup, I only have 3 PE devices, namely PE-1, PE-2 and PE-3. I also only have one P node, called P-1.
However small this setup is, its enough to demonstrate the power of the Auto-Tunnel mesh functionality.

Beyond that, I have setup a small MPLS L3 VPN service for customer CUST-A, which has a presence on all 3 PE nodes. The VPNv4 address-family is using a RR which for this purpose is P-1.

We are running OSPF as the IGP of choice. This means that our Mesh membership will be signaled using Opaque LSA’s, which I will show you later on.

The goal of the lab is to use the Auto-Tunnel mesh functionality to create a full mesh of tunnels between my PE nodes and use this exclusively for label switching and to do so with a general template that would scale to many more PE devices than just the 3 in this lab.

The very first thing you want to do is to enable MPLS-TE both globally and on your interfaces. We can verify this on PE-2:

PE-2:

mpls traffic-eng tunnels
!
interface GigabitEthernet2
ip address 10.2.100.2 255.255.255.0
negotiation auto
mpls traffic-eng tunnels
!

The second thing you want to do is to enable the mesh-feature globally using the following command as configured on PE-2 as well:

PE-2:

mpls traffic-eng auto-tunnel mesh

Starting off with MPLS-TE, we need to make sure our IGP is actually signaling this to begin with. I have configured MPLS-TE on the area 0 which is the only area in use in our topology:

PE-2:

router ospf 1
network 0.0.0.0 255.255.255.255 area 0
mpls traffic-eng router-id Loopback0
mpls traffic-eng area 0
mpls traffic-eng mesh-group 100 Loopback0 area 0

Dont get hung up on the last configuration line. I will explain this shortly. However notice the “mpls traffic-eng area 0” and “mpls traffic-eng router-id loopback0”. After those two lines are configured, you should be able to retrieve information on the MPLS-TE topology as seen from your IGP:

PE-2:

PE-2#sh mpls traffic-eng topology brief
My_System_id: 2.2.2.2 (ospf 1 area 0)

Signalling error holddown: 10 sec Global Link Generation 22

IGP Id: 1.1.1.1, MPLS TE Id:1.1.1.1 Router Node (ospf 1 area 0)
Area mg-id's:
: mg-id 100 1.1.1.1 :
link[0]: Broadcast, DR: 10.1.100.100, nbr_node_id:8, gen:14
frag_id: 2, Intf Address: 10.1.100.1
TE metric: 1, IGP metric: 1, attribute flags: 0x0
SRLGs: None

IGP Id: 2.2.2.2, MPLS TE Id:2.2.2.2 Router Node (ospf 1 area 0)
link[0]: Broadcast, DR: 10.2.100.100, nbr_node_id:9, gen:19
frag_id: 2, Intf Address: 10.2.100.2
TE metric: 1, IGP metric: 1, attribute flags: 0x0
SRLGs: None

IGP Id: 3.3.3.3, MPLS TE Id:3.3.3.3 Router Node (ospf 1 area 0)
Area mg-id's:
: mg-id 100 3.3.3.3 :
link[0]: Broadcast, DR: 10.3.100.100, nbr_node_id:11, gen:22
frag_id: 2, Intf Address: 10.3.100.3
TE metric: 1, IGP metric: 1, attribute flags: 0x0
SRLGs: None

IGP Id: 10.1.2.2, MPLS TE Id:22.22.22.22 Router Node (ospf 1 area 0)
link[0]: Broadcast, DR: 10.1.100.100, nbr_node_id:8, gen:17
frag_id: 3, Intf Address: 10.1.100.100
TE metric: 10, IGP metric: 10, attribute flags: 0x0
SRLGs: None

link[1]: Broadcast, DR: 10.2.100.100, nbr_node_id:9, gen:17
frag_id: 4, Intf Address: 10.2.100.100
TE metric: 10, IGP metric: 10, attribute flags: 0x0
SRLGs: None

link[2]: Broadcast, DR: 10.3.100.100, nbr_node_id:11, gen:17
frag_id: 5, Intf Address: 10.3.100.100
TE metric: 10, IGP metric: 10, attribute flags: 0x0
SRLGs: None

IGP Id: 10.1.100.100, Network Node (ospf 1 area 0)
link[0]: Broadcast, Nbr IGP Id: 10.1.2.2, nbr_node_id:5, gen:13

link[1]: Broadcast, Nbr IGP Id: 1.1.1.1, nbr_node_id:6, gen:13

IGP Id: 10.2.100.100, Network Node (ospf 1 area 0)
link[0]: Broadcast, Nbr IGP Id: 10.1.2.2, nbr_node_id:5, gen:18

link[1]: Broadcast, Nbr IGP Id: 2.2.2.2, nbr_node_id:1, gen:18

IGP Id: 10.3.100.100, Network Node (ospf 1 area 0)
link[0]: Broadcast, Nbr IGP Id: 10.1.2.2, nbr_node_id:5, gen:21

link[1]: Broadcast, Nbr IGP Id: 3.3.3.3, nbr_node_id:7, gen:21

The important thing to notice here is that we are indeed seeing the other routers in the network, all the PE devices as well as the P device.

Now to the last line of configuration under the router ospf process:

PE-2:

"mpls traffic-eng mesh-group 100 Loopback0 area 0"

What this states is that we would like to use the Auto-Tunnel Mesh group feature, with this PE node being a member of group 100, using loopback0 for communication on the tunnel and running within the area 0.

This by itself only handles the signaling, but we also want to deploy a template in order to create the individual tunnel interfaces. This is done in the following manner:

PE-2:

interface Auto-Template100
ip unnumbered Loopback0
tunnel mode mpls traffic-eng
tunnel destination mesh-group 100
tunnel mpls traffic-eng autoroute announce
tunnel mpls traffic-eng path-option 10 dynamic

Using the Auto-Template100 interface, we, as we would also do in manual TE, specify our loopback address, the tunnel mode and the path option. Note that here we are simply following the IGP, which sort of defeats the purpose of many MPLS-TE configurations. But with our topology there is no path diversity so it wouldnt matter anyways.

Also, the autoroute announce command is used to force traffic into the tunnels.

The important thing is the “tunnel destination mesh-group 100” which ties this configuration snippet into the OSPF one.

After everything is setup, you should see some dynamic tunnels being created on each PE node:

PE-2:

PE-2#sh ip int b | incl up
GigabitEthernet1 100.100.101.100 YES manual up up
GigabitEthernet2 10.2.100.2 YES manual up up
Auto-Template100 2.2.2.2 YES TFTP up up
Loopback0 2.2.2.2 YES manual up up
Tunnel64336 2.2.2.2 YES TFTP up up
Tunnel64337 2.2.2.2 YES TFTP up up

Lets verify the current RIB configuration after this step:

PE-2:

PE-2#sh ip route | beg Gateway
Gateway of last resort is not set

1.0.0.0/32 is subnetted, 1 subnets
O 1.1.1.1 [110/12] via 1.1.1.1, 00:29:13, Tunnel64336
2.0.0.0/32 is subnetted, 1 subnets
C 2.2.2.2 is directly connected, Loopback0
3.0.0.0/32 is subnetted, 1 subnets
O 3.3.3.3 [110/12] via 3.3.3.3, 00:28:48, Tunnel64337
10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
O 10.1.100.0/24 [110/11] via 10.2.100.100, 00:29:13, GigabitEthernet2
C 10.2.100.0/24 is directly connected, GigabitEthernet2
L 10.2.100.2/32 is directly connected, GigabitEthernet2
O 10.3.100.0/24 [110/11] via 10.2.100.100, 00:29:13, GigabitEthernet2
22.0.0.0/32 is subnetted, 1 subnets
O 22.22.22.22 [110/2] via 10.2.100.100, 00:29:13, GigabitEthernet2

Very good. We can see that in order to reach 1.1.1.1/32 which is PE-1’s loopback, we are indeed routing through one of the dynamic tunnels.
The same goes for 3.3.3.3/32 towards PE-3’s loopback.
PE-2:

PE-2#traceroute 1.1.1.1 so loo0
Type escape sequence to abort.
Tracing the route to 1.1.1.1
VRF info: (vrf in name/id, vrf out name/id)
1 10.2.100.100 [MPLS: Label 17 Exp 0] 16 msec 22 msec 22 msec
2 10.1.100.1 25 msec * 19 msec

We can see that traffic towards that loopback is indeed being label-switched. And just to make it obvious, let me make sure we are not using LDP 🙂

PE-2:

PE-2#sh mpls ldp neighbor
PE-2#

On P-1, it being the midpoint of our LSP’s, we would expect 6 unidirectional tunnels in total:

P-1:

P-1#sh mpls for
Local Outgoing Prefix Bytes Label Outgoing Next Hop
Label Label or Tunnel Id Switched interface
16 Pop Label 3.3.3.3 64336 [6853] \
472 Et2/0 10.1.100.1
17 Pop Label 2.2.2.2 64336 [2231] \
2880 Et2/0 10.1.100.1
18 Pop Label 1.1.1.1 64336 [4312] \
2924 Et2/1 10.2.100.2
19 Pop Label 1.1.1.1 64337 [4962] \
472 Et2/2 10.3.100.3
20 Pop Label 2.2.2.2 64337 [6013] \
562 Et2/2 10.3.100.3
21 Pop Label 3.3.3.3 64337 [4815] \
0 Et2/1 10.2.100.2

Exactly what we expected.
The following is the output of the command: “show ip ospf database opaque-area” on PE-2. I have cut it down to the relevant opaque-lsa part (we are using 2 types, one for the general MPLS-TE and one for the Mesh-Group feature):

LS age: 529
Options: (No TOS-capability, DC)
LS Type: Opaque Area Link
Link State ID: 4.0.0.0
Opaque Type: 4
Opaque ID: 0
Advertising Router: 1.1.1.1
LS Seq Number: 80000002
Checksum: 0x5364
Length: 32

Capability Type: Mesh-group
Length: 8
Value:

0000 0064 0101 0101

LS age: 734
Options: (No TOS-capability, DC)
LS Type: Opaque Area Link
Link State ID: 4.0.0.0
Opaque Type: 4
Opaque ID: 0
Advertising Router: 2.2.2.2
LS Seq Number: 80000002
Checksum: 0x6748
Length: 32

Capability Type: Mesh-group
Length: 8
Value:

0000 0064 0202 0202

LS age: 701
Options: (No TOS-capability, DC)
LS Type: Opaque Area Link
Link State ID: 4.0.0.0
Opaque Type: 4
Opaque ID: 0
Advertising Router: 3.3.3.3
LS Seq Number: 80000002
Checksum: 0x7B2C
Length: 32

Capability Type: Mesh-group
Length: 8
Value:

0000 0064 0303 0303

I have highlighted the interesting parts, which is the Advertising Router and the value of the TLV, those starting with 0000 0064, which is in fact the membership of “100” being signaled across the IGP area.
Okay, all good i hear you say, but lets do an end-to-end test from the CE devices in Customer CUST-A’s domain:

R1:

R1#sh ip route | beg Gateway
Gateway of last resort is not set

10.0.0.0/32 is subnetted, 3 subnets
C 10.1.1.1 is directly connected, Loopback0
B 10.2.2.2 [20/0] via 100.100.100.100, 00:37:46
B 10.3.3.3 [20/0] via 100.100.100.100, 00:37:36
100.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C 100.100.100.0/24 is directly connected, FastEthernet0/0
L 100.100.100.1/32 is directly connected, FastEthernet0/0

So we are learning the routes on the customer side (through standard IPv4 BGP).

R1:

R1#ping 10.2.2.2 so loo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/72/176 ms
R1#ping 10.3.3.3 so loo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.3.3.3, timeout is 2 seconds:
Packet sent with a source address of 10.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 20/32/48 ms

We have reachability! – What about traceroute:

R1:

R1#traceroute 10.2.2.2 so loo0

Type escape sequence to abort.
Tracing the route to 10.2.2.2

1 100.100.100.100 28 msec 20 msec 12 msec
2 10.1.100.100 [MPLS: Labels 18/17 Exp 0] 44 msec 136 msec 60 msec
3 100.100.101.100 [MPLS: Label 17 Exp 0] 28 msec 32 msec 12 msec
4 100.100.101.3 28 msec 32 msec 24 msec
R1#traceroute 10.3.3.3 so loo0

Type escape sequence to abort.
Tracing the route to 10.3.3.3

1 100.100.100.100 48 msec 16 msec 8 msec
2 10.1.100.100 [MPLS: Labels 19/17 Exp 0] 48 msec 12 msec 52 msec
3 100.100.102.100 [MPLS: Label 17 Exp 0] 16 msec 28 msec 36 msec
4 100.100.102.4 68 msec 56 msec 48 msec

Just what we would expect from our L3 MPLS VPN service. A transport label (this time through MPLS-TE) and a VPN label as signaled through MP-BGP.

To round it off, I have attached the following from a packet capture on P-1’s interface toward PE-1 and then re-issued the ICMP-echo from R1’s loopback toward R2’s loopback adress:

wireshark-output

With that, I hope its been informative for you. Thanks for reading!

References:

http://www.cisco.com/c/en/us/td/docs/ios/12_0s/feature/guide/gsmeshgr.html

Configurations:

configurations

Practical DMVPN Example

In this post, I will put together a variety of different technologies involved in a real-life DMVPN deployment.

This includes things such as the correct tunnel configuration, routing-configuration using BGP as the protocol of choice, as well as NAT toward an upstream provider and front-door VRF’s in order to implement a default-route on both the Hub and the Spokes and last, but not least a newer feature, namely Per-Tunnel QoS using NHRP.

So I hope you will find the information relevant to your DMVPN deployments.

First off, lets take a look at the topology I will be using for this example:
Topology

As can be seen, we have a hub router which is connected to two different ISP’s. One to a general purpose internet provider (the internet cloud in this topology) which is being used as transport for our DMVPN setup, as well as a router in the TeleCom network (AS 59701), providing a single route for demonstration purposes (8.8.8.8/32). We have been assigned the 70.0.0.0/24 network from TeleCom to use for internet access as well.

Then we have to Spoke sites, with a single router in each site (Spoke-01 and Spoke-02 respectively).
Each one has a loopback interface which is being announced.

The first “trick” here, is to use the so-called Front Door VRF feature. What this basically allows is to have your transport interface located in a separate VRF. This allows us to have 2 default (0.0.0.0) routes. One used for the transport network and one for the “global” VRF, which is being used by the clients behind each router.

I have created a VRF on the 3 routers in our network (the Hub, Spoke-01 and Spoke-02) called “Inet_VRF”. Lets take a look at the configuration for this network along with its routing information (RIB):

HUB#sh run | beg vrf defi
vrf definition Inet_VRF
!
address-family ipv4
exit-address-family

HUB#sh ip route vrf Inet_VRF | beg Ga
Gateway of last resort is 130.0.0.1 to network 0.0.0.0

S* 0.0.0.0/0 [1/0] via 130.0.0.1
130.0.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 130.0.0.0/30 is directly connected, GigabitEthernet1
L 130.0.0.2/32 is directly connected, GigabitEthernet1

Very simple indeed. We are just using the IPv4 address-family for this VRF and we have a static default route pointing toward the Internet Cloud.

The spokes are very similar:

Spoke-01:




Spoke-01#sh ip route vrf Inet_VRF | beg Gat
Gateway of last resort is 140.0.0.1 to network 0.0.0.0

S* 0.0.0.0/0 [1/0] via 140.0.0.1
140.0.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 140.0.0.0/30 is directly connected, GigabitEthernet1
L 140.0.0.2/32 is directly connected, GigabitEthernet1



Spoke-02:



Spoke-02#sh ip route vrf Inet_VRF | beg Gat
Gateway of last resort is 150.0.0.1 to network 0.0.0.0

S* 0.0.0.0/0 [1/0] via 150.0.0.1
150.0.0.0/16 is variably subnetted, 2 subnets, 2 masks
C 150.0.0.0/30 is directly connected, GigabitEthernet1
L 150.0.0.2/32 is directly connected, GigabitEthernet1



With this in place, we should have full reachability to the internet interface address of each router in the Inet_VRF:

HUB#ping vrf Inet_VRF 140.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 140.0.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 6/22/90 ms

HUB#ping vrf Inet_VRF 150.0.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 150.0.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/16/65 ms

With this crucial piece of configuration for the transport network, we can now start building our DMVPN network, starting with the Hub configuration:

HUB#sh run int tun100
Building configuration...



Current configuration : 452 bytes
!
interface Tunnel100
ip address 172.16.0.100 255.255.255.0
no ip redirects
ip mtu 1400
ip nat inside
ip nhrp network-id 100
ip nhrp redirect
ip tcp adjust-mss 1360
load-interval 30
nhrp map group 10MB-Group service-policy output 10MB-Parent
nhrp map group 30MB-Group service-policy output 30MB-Parent
tunnel source GigabitEthernet1
tunnel mode gre multipoint
tunnel vrf Inet_VRF
tunnel protection ipsec profile DMVPN-PROFILE shared
end



There are a fair bit of configuration here. However, pay attention to the “tunnel vrf Inet_VRF” command, as this is the piece that ties into your transport address for the tunnel. So basically we use G1 for the interface, and this is located in the Inet_VRF.

Also notice that we are running crypto on top of our tunnel to protect it from prying eyes. The relevant configuration is here:

crypto keyring MY-KEYRING vrf Inet_VRF
pre-shared-key address 0.0.0.0 0.0.0.0 key SUPER-SECRET
!
!
!
!
!
crypto isakmp policy 1
encr aes 256
hash sha256
authentication pre-share
group 2
!
!
crypto ipsec transform-set TRANSFORM-SET esp-aes 256 esp-sha256-hmac
mode transport
!
crypto ipsec profile DMVPN-PROFILE
set transform-set TRANSFORM-SET



Pretty straightforward with a static pre shared key in place for all nodes.

With the crypto in place, you should have a SA for it installed:

HUB#sh crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst src state conn-id status
130.0.0.2 150.0.0.2 QM_IDLE 1002 ACTIVE
130.0.0.2 140.0.0.2 QM_IDLE 1001 ACTIVE



One for each spoke is in place an in the correct state (QM_IDLE = Good).

So now, lets verify the entire DMVPN solution with a few “show” commands:

HUB#sh dmvpn
Legend: Attrb --> S - Static, D - Dynamic, I - Incomplete
N - NATed, L - Local, X - No Socket
T1 - Route Installed, T2 - Nexthop-override
C - CTS Capable
# Ent --> Number of NHRP entries with same NBMA peer
NHS Status: E --> Expecting Replies, R --> Responding, W --> Waiting
UpDn Time --> Up or Down Time for a Tunnel
==========================================================================

Interface: Tunnel100, IPv4 NHRP Details
Type:Hub, NHRP Peers:2,

# Ent Peer NBMA Addr Peer Tunnel Add State UpDn Tm Attrb
----- --------------- --------------- ----- -------- -----
1 140.0.0.2 172.16.0.1 UP 05:24:09 D
1 150.0.0.2 172.16.0.2 UP 05:24:03 D


We have 2 spokes associated with our Tunnel100. One where the public address (NBMA) is 140.0.02 and the other 150.0.0.2. Inside the tunnel, the spokes have an IPv4 address of 172.16.0.1 and .2 respectively. Also, we can see that these are being learned Dynamically (The D in the 5 column).

All is well and good so far. But we still need to run a routing protocol across the tunnel interface in order to exchange routes. BGP and EIGRP are good candidates for this and in this example I have used BGP. And since we are running Phase 3 DMVPN, we can actually get away with just receiving a default route on the spokes!. At this point remember that our previous default route pointing toward the internet was in the Inet_VRF table, so these two wont collide.

Take a look at the BGP configuration on the hub router:

HUB#sh run | beg router bgp
router bgp 100
bgp log-neighbor-changes
bgp listen range 172.16.0.0/24 peer-group MYPG
network 70.0.0.0 mask 255.255.255.0
neighbor MYPG peer-group
neighbor MYPG remote-as 100
neighbor MYPG next-hop-self
neighbor MYPG default-originate
neighbor MYPG route-map ONLY-DEFAULT-MAP out
neighbor 133.1.2.2 remote-as 59701
neighbor 133.1.2.2 route-map TO-UPSTREAM-SP out



And the referenced route-maps:

ip prefix-list ONLY-DEFAULT-PFX seq 5 permit 0.0.0.0/0
!
ip prefix-list OUR-SCOPE-PFX seq 5 permit 70.0.0.0/24
!
route-map TO-UPSTREAM-SP permit 5
match ip address prefix-list OUR-SCOPE-PFX
!
route-map TO-UPSTREAM-SP deny 10
!
route-map ONLY-DEFAULT-MAP permit 10
match ip address prefix-list ONLY-DEFAULT-PFX



We are using the BGP listen feature, which makes it dynamic to setup BGP peers. We allow everything in the 172.16.0.0/24 network to setup a BGP session and we are using the Peer-Group MYPG for controlling the settings. Notice that we are sending out only a default route to the spokes.

Also pay attention to the fact that we are sending the 70.0.0.0/24 upstream to the TeleCom ISP. Since we are going to use this network for NAT’ing purposes only, we have a static route to Null0 installed as well:

HUB#sh run | incl ip route
ip route 70.0.0.0 255.255.255.0 Null0



For the last part of our BGP configuration, lets take a look at the Spoke configuration, which is very simple and straightforward:

Spoke-01#sh run | beg router bgp
router bgp 100
bgp log-neighbor-changes
redistribute connected route-map ONLY-LOOPBACK0
neighbor 172.16.0.100 remote-as 100



And the associated route-map:

route-map ONLY-LOOPBACK0 permit 10
match interface Loopback0

So basically thats a cookie-cutter configuration thats being reused on Spoke-02 as well.

So what does the routing end up with on the Hub side of things:

HUB# sh ip bgp | beg Networ
Network Next Hop Metric LocPrf Weight Path
0.0.0.0 0.0.0.0 0 i
*>i 1.1.1.1/32 172.16.0.1 0 100 0 ?
*>i 2.2.2.2/32 172.16.0.2 0 100 0 ?
*> 8.8.8.8/32 133.1.2.2 0 0 59701 i
*> 70.0.0.0/24 0.0.0.0 0 32768 i

We have a default route, which we injected using the default-information originate command. Then we receive the loopback addresses from each of the two spokes. Finally we have the network statement, the hub inserted, sending it to the upstream TeleCom. Finally we have 8.8.8.8/32 from TeleCom 🙂

Lets look at the spokes:

Spoke-01#sh ip bgp | beg Network
Network Next Hop Metric LocPrf Weight Path
*>i 0.0.0.0 172.16.0.100 0 100 0 i
*> 1.1.1.1/32 0.0.0.0 0 32768 ?

Very straightforward and simple. A default from the hub and the loopback we ourself injected.

Right now, we are in a situation where we have the correct routing information and we are ready to let NHRP do its magic. Lets take a look what happens when Spoke-01 sends a ping to Spoke-02’s loopback address:

Spoke-01#ping 2.2.2.2 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/17/32 ms

Everything works, and we the math is right, we should see an NHRP shortcut being created for the Spoke to Spoke tunnel:

Spoke-01#sh ip nhrp shortcut
2.2.2.2/32 via 172.16.0.2
Tunnel100 created 00:00:06, expire 01:59:53
Type: dynamic, Flags: router rib
NBMA address: 150.0.0.2
Group: 30MB-Group
172.16.0.2/32 via 172.16.0.2
Tunnel100 created 00:00:06, expire 01:59:53
Type: dynamic, Flags: router nhop rib
NBMA address: 150.0.0.2
Group: 30MB-Group

and on Spoke-02:

Spoke-02#sh ip nhrp shortcut
1.1.1.1/32 via 172.16.0.1
Tunnel100 created 00:01:29, expire 01:58:31
Type: dynamic, Flags: router rib
NBMA address: 140.0.0.2
Group: 10MB-Group
172.16.0.1/32 via 172.16.0.1
Tunnel100 created 00:01:29, expire 01:58:31
Type: dynamic, Flags: router nhop rib
NBMA address: 140.0.0.2
Group: 10MB-Group

And the RIB on both routers should reflect this as well:

Gateway of last resort is 172.16.0.100 to network 0.0.0.0



B* 0.0.0.0/0 [200/0] via 172.16.0.100, 06:09:37
1.0.0.0/32 is subnetted, 1 subnets
C 1.1.1.1 is directly connected, Loopback0
2.0.0.0/32 is subnetted, 1 subnets
H 2.2.2.2 [250/255] via 172.16.0.2, 00:02:08, Tunnel100
172.16.0.0/16 is variably subnetted, 3 subnets, 2 masks
C 172.16.0.0/24 is directly connected, Tunnel100
L 172.16.0.1/32 is directly connected, Tunnel100
H 172.16.0.2/32 is directly connected, 00:02:08, Tunnel100

The “H” routes are from NHRP

And just to verify that our crypto is working, heres a capture from wireshark on the internet “Cloud” when pinging from Spoke-01 to Spoke-02:

wireshark

Now lets turn our attention to the Quality of Service aspect of our solution.

We have 3 facts to deal with.

1) The Hub router has a line-rate Gigabit Ethernet circuit to the Internet.
2) The Spoke-01 site has a Gigabit Ethernet circuit, but its a subrate to 10Mbit access-rate.
3) The Spoke-02 site has a Gigabit Ethernet circuit, but its a subrate to 30Mbit access-rate.

We somehow want to signal to the Hub site to “respect” these access-rates. This is where the “Per-Tunnel QoS” feature comes into play.

If you remember the Hub tunnel100 configuration, which looks like this:

interface Tunnel100
ip address 172.16.0.100 255.255.255.0
no ip redirects
ip mtu 1400
ip nat inside
ip nhrp network-id 100
ip nhrp redirect
ip tcp adjust-mss 1360
load-interval 30
nhrp map group 10MB-Group service-policy output 10MB-Parent
nhrp map group 30MB-Group service-policy output 30MB-Parent
tunnel source GigabitEthernet1
tunnel mode gre multipoint
tunnel vrf Inet_VRF
tunnel protection ipsec profile DMVPN-PROFILE shared



We have these 2 “nhrp map” statements. What these in effect does is to provide a framework for the Spokes to register using one of these two maps for the Hub to use to that individual spoke.

So these are the policy-maps we reference:

HUB#sh policy-map
Policy Map 30MB-Child
Class ICMP
priority 5 (kbps)
Class TCP
bandwidth 50 (%)

Policy Map 10MB-Parent
Class class-default
Average Rate Traffic Shaping
cir 10000000 (bps)
service-policy 10MB-Child

Policy Map 10MB-Child
Class ICMP
priority 10 (%)
Class TCP
bandwidth 80 (%)


Policy Map 30MB-Parent
Class class-default
Average Rate Traffic Shaping
cir 30000000 (bps)
service-policy 30MB-Child



We have a hiearchical policy for both the 10Mbit and 30Mbit groups. each with their own child policies.

On the Spoke side of things, all we have to do is to tell the Hub which group to use:

interface Tunnel100
bandwidth 10000
ip address 172.16.0.1 255.255.255.0
no ip redirects
ip mtu 1400
ip nhrp map 172.16.0.100 130.0.0.2
ip nhrp map multicast 130.0.0.2
ip nhrp network-id 100
ip nhrp nhs 172.16.0.100
ip nhrp shortcut
ip tcp adjust-mss 1360
load-interval 30
nhrp group 10MB-Group
qos pre-classify
tunnel source GigabitEthernet1
tunnel mode gre multipoint
tunnel vrf Inet_VRF
tunnel protection ipsec profile DMVPN-PROFILE



Here on Spoke-01, we request that the QoS group to be used is 10MB-Group.

And on Spoke-02:

interface Tunnel100
bandwidth 30000
ip address 172.16.0.2 255.255.255.0
no ip redirects
ip mtu 1400
ip nhrp map 172.16.0.100 130.0.0.2
ip nhrp map multicast 130.0.0.2
ip nhrp network-id 100
ip nhrp nhs 172.16.0.100
ip nhrp shortcut
ip tcp adjust-mss 1360
load-interval 30
nhrp group 30MB-Group
qos pre-classify
tunnel source GigabitEthernet1
tunnel mode gre multipoint
tunnel vrf Inet_VRF
tunnel protection ipsec profile DMVPN-PROFILE



We request the 30MB-Group.

So lets verify that the Hub understands this and applies it accordingly:

HUB#sh nhrp group-map
Interface: Tunnel100
NHRP group: 10MB-Group
QoS policy: 10MB-Parent
Transport endpoints using the qos policy:
140.0.0.2



NHRP group: 30MB-Group
QoS policy: 30MB-Parent
Transport endpoints using the qos policy:
150.0.0.2



Excellent. and to see that its actually applied correctly:

HUB#sh policy-map multipoint tunnel 100

 

Interface Tunnel100 <--> 140.0.0.2

Service-policy output: 10MB-Parent

Class-map: class-default (match-any)
903 packets, 66746 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: any
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 900/124744
shape (average) cir 10000000, bc 40000, be 40000
target shape rate 10000000

Service-policy : 10MB-Child

queue stats for all priority classes:
Queueing
queue limit 512 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 10/1860

Class-map: ICMP (match-all)
10 packets, 1240 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: protocol icmp
Priority: 10% (1000 kbps), burst bytes 25000, b/w exceed drops: 0
Class-map: TCP (match-all)
890 packets, 65494 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: access-group 110
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 890/122884
bandwidth 80% (8000 kbps)

Class-map: class-default (match-any)
3 packets, 12 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: any

queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 0/0

Interface Tunnel100 <--> 150.0.0.2

Service-policy output: 30MB-Parent

Class-map: class-default (match-any)
901 packets, 66817 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: any
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 901/124898
shape (average) cir 30000000, bc 120000, be 120000
target shape rate 30000000

Service-policy : 30MB-Child

queue stats for all priority classes:
Queueing
queue limit 512 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 10/1860

Class-map: ICMP (match-all)
10 packets, 1240 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: protocol icmp
Priority: 5 kbps, burst bytes 1500, b/w exceed drops: 0
Class-map: TCP (match-all)
891 packets, 65577 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: access-group 110
Queueing
queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 891/123038
bandwidth 50% (15000 kbps)

Class-map: class-default (match-any)
0 packets, 0 bytes
30 second offered rate 0000 bps, drop rate 0000 bps
Match: any

queue limit 64 packets
(queue depth/total drops/no-buffer drops) 0/0/0
(pkts output/bytes output) 0/0


The last piece of the QoS puzzle is to make sure you have a service-policy applied on the transport interfaces on the spokes as well:

Spoke-01#sh run int g1
Building configuration...

&nbsp;


Current configuration : 210 bytes
!
interface GigabitEthernet1
description -= Towards Internet Router =-
bandwidth 30000
vrf forwarding Inet_VRF
ip address 140.0.0.2 255.255.255.252
negotiation auto
service-policy output 10MB-Parent
end



and on Spoke-02:

poke-02#sh run int g1
Building configuration...

&nbsp;


Current configuration : 193 bytes
!
interface GigabitEthernet1
description -= Towards Internet Router =-
vrf forwarding Inet_VRF
ip address 150.0.0.2 255.255.255.252
negotiation auto
service-policy output 30MB-Parent
end



The last thing i want to mention is the NAT on the hub to use the 70.0.0.0/24 network for the outside world. Pretty straight forward NAT (inside on the tunnel interface 100 and outside on the egress interface toward Telecom, G2):

HUB#sh run int g2
Building configuration...

&nbsp;


Current configuration : 106 bytes
!
interface GigabitEthernet2
ip address 133.1.2.1 255.255.255.252
ip nat outside
negotiation auto
end



Also the NAT configuration itself:

ip nat pool NAT-POOL 70.0.0.1 70.0.0.253 netmask 255.255.255.0
ip nat inside source list 10 pool NAT-POOL overload
!
HUB#sh access-list 10
Standard IP access list 10
10 permit 1.1.1.1
20 permit 2.2.2.2



We are only NAT’ing the two loopbacks from the spokes on our network.

Lets do a final verification on the spokes to 8.8.8.8/32:



Spoke-01#ping 8.8.8.8 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/23/56 ms

and Spoke-02:

Spoke-02#ping 8.8.8.8 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 8.8.8.8, timeout is 2 seconds:
Packet sent with a source address of 2.2.2.2
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/14/27 ms



Lets verify the NAT state table on the hub:

HUB#sh ip nat trans
Pro Inside global Inside local Outside local Outside global
icmp 70.0.0.1:1 1.1.1.1:5 8.8.8.8:5 8.8.8.8:1
icmp 70.0.0.1:2 2.2.2.2:0 8.8.8.8:0 8.8.8.8:2
Total number of translations: 2



All good!.

I hope you have had a chance to look at some of the fairly simple configuration snippets thats involved in these techniques and how they fit together in the overall scheme of things.

If you have any questions, please let me know!

Have fun with the lab!

(Configurations will be added shortly!)

GETVPN Example

A couple of weeks ago I had the good fortune of attending Jeremy Filliben’s CCDE Bootcamp.
It was a great experience, which I will elaborate on in another post. But one of the technology areas I had a bit of difficult with, was GETVPN.

So in this post a I am going to setup a scenario in which a customer has 3 sites, 2 “normal” sites and a Datacenter site. The customer wants to encrypt traffic from Site 1 to Site 2.

Currently the customer has a regular L3VPN service from a provider (which is beyond the scope of this post). There is full connectivity between the 3 sites through this service.

The topology is as follows:

Topology

GETVPN consists of a few components, namely the Key Server (KS) and Group Members (GM’s), which is where it derives its name: Group Encrypted Transport. A single SA (Security Association) is used for the encryption. The Key Server distributes the information to the Group Members through a secure transport, where the Group Members then use this information (basically an ACL) to encrypt/decrypt the data packets.

The routing for the topology is fairly simple. (See Routing Diagram) Each client as well as the KeyServer uses a default route to reach the rest of the topology. Each CE router runs eBGP with the provider, where it redistributes the conntected interfaces into BGP for full reachability between the sites.

Routing-Topology

At this point, lets verify that we have full connectivty through the L3VPN SP.

On CE-1:

CE1#sh ip bgp
BGP table version is 7, local router ID is 192.168.12.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, 
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, 
              x best-external, a additional-path, c RIB-compressed, 
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>  10.10.1.0/24     0.0.0.0                  0         32768 ?
 *>  10.10.2.0/24     10.10.1.2                              0 100 100 ?
 *>  10.10.3.0/24     10.10.1.2                              0 100 100 ?
 *>  192.168.12.0     0.0.0.0                  0         32768 ?
 *>  192.168.23.0     10.10.1.2                              0 100 100 ?
 *>  192.168.34.0     10.10.1.2                              0 100 100 ?

We are learning the routes to the other sites.

And connectivity from Client-1:

Client-1#ping 192.168.34.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.34.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/10/25 ms

The interesting part takes place on the KeyServer along wi th CE1 and CE3.

If we take a look at the configuration on the KeyServer.

First off, we have a regular extended ACL that defines what traffic we want to encrypt. This ACL is the one that gets “downloaded” to CE1 and CE3:

ip access-list extended CRYPTO_ACL
 permit ip 192.168.12.0 0.0.0.255 192.168.34.0 0.0.0.255
 permit ip 192.168.34.0 0.0.0.255 192.168.12.0 0.0.0.255
 

 

Register-ACL-Download

Next up we have an ISAKMP policy which is used during the information communication with the KeyServer. This policy is present on all the Group Members (GM’s) and the KeyServer:

crypto isakmp policy 10
 encr aes 256
 hash sha256
 authentication pre-share
 group 2
crypto isakmp key SUPERSECRET address 0.0.0.0        

In this example we use a simple Pre Shared Key with the Any address form. This can (and probably should) be either certificate based. However, this complicates matters, so i skipped that.

Next is the transform set for IPsec which will be used. Notice that we use tunnel mode.

crypto ipsec transform-set GET-VPN-TRANSFORM-SET esp-aes esp-sha256-hmac 
 mode tunnel

This transform set is being referenced in a IPsec profile configuration:

crypto ipsec profile GETVPN-PROFILE
 set transform-set GET-VPN-TRANSFORM-SET 

This is nesecary in order for the next configuration, which is the entire GDOI aspect:

crypto gdoi group GDOI-GROUP
 identity number 100
 server local
  rekey authentication mypubkey rsa GETVPN-KEY
  rekey transport unicast
  sa ipsec 1
   profile GETVPN-PROFILE
   match address ipv4 CRYPTO_ACL
   replay counter window-size 64
   no tag
  address ipv4 192.168.23.1

Here we are creating a GDOI configuration, where we have a unique identifier for this group configuration (100). We are telling the router that its the server. Next is the public key we have created with an name this time (“crypto key generate rsa label “). This is used for rekeying purposes. And notice that we are using unicasting for the key material. This could just as well have been multicast, but again, that requires you have your infrastructure multicast capable and ready.

We then reference our previous IPsec profile and specify our crypt “ACL”. Lastly we specify which “update source” should be used for this server (which the other GM’s will use to communicate to/from).

If we then match this to what is configured on CE1 and CE3:

crypto isakmp policy 10
 encr aes 256
 hash sha256
 authentication pre-share
 group 2
crypto isakmp key SUPERSECRET address 0.0.0.0        
crypto gdoi group GDOI-GROUP
 identity number 100
 server address ipv4 192.168.23.1
crypto map MYMAP 10 gdoi 
 set group GDOI-GROUP
 crypto map MYMAP

And on the interface towards the SP we apply the crypto map:

CE1#sh run int g1.10
Building configuration...

Current configuration : 115 bytes
!
interface GigabitEthernet1.10
 encapsulation dot1Q 10
 ip address 10.10.1.1 255.255.255.0
 crypto map MYMAP
end

 

Crypto Map Topology

We can see that we have the ISAKMP configuration which I mentioned thats being used for a secure communication channel. Next we simply have the location of our KeyServer and the Identifier and thats pretty much all. Everything else is being learned from the Key Server.

After everything has been configured, you can see the log showing the registration process:

*May 15 10:37:53.245: %CRYPTO-5-GM_REGSTER: Start registration to KS 192.168.23.1 for group GDOI-GROUP using address 10.10.3.1 fvrf default ivrf default
*May 15 10:38:23.356: %GDOI-5-SA_TEK_UPDATED: SA TEK was updated
*May 15 10:38:23.395: %GDOI-5-SA_KEK_UPDATED: SA KEK was updated 0x5DB57E80F97A9A1DC16B9DBBCF7CB169
*May 15 10:38:23.395: %GDOI-5-GM_REGS_COMPL: Registration to KS 192.168.23.1 complete for group GDOI-GROUP using address 10.10.3.1 fvrf default ivrf default
*May 15 10:38:23.668: %GDOI-5-GM_INSTALL_POLICIES_SUCCESS: SUCCESS: Installation of Reg/Rekey policies from KS 192.168.23.1 for group GDOI-GROUP & gm identity 10.10.3.1 fvrf default ivrf default

Another form of verification is the “show crypto gdoi” command structure, which gives you alot of information on the process:

CE1#sh crypto gdoi 
GROUP INFORMATION

    Group Name               : GDOI-GROUP
    Group Identity           : 100
    Group Type               : GDOI (ISAKMP)
    Crypto Path              : ipv4
    Key Management Path      : ipv4
    Rekeys received          : 0
    IPSec SA Direction       : Both

     Group Server list       : 192.168.23.1
                               
Group Member Information For Group GDOI-GROUP:
    IPSec SA Direction       : Both
    ACL Received From KS     : gdoi_group_GDOI-GROUP_temp_acl

    Group member             : 10.10.1.1       vrf: None
       Local addr/port       : 10.10.1.1/848
       Remote addr/port      : 192.168.23.1/848
       fvrf/ivrf             : None/None
       Version               : 1.0.16
       Registration status   : Registered
       Registered with       : 192.168.23.1
       Re-registers in       : 1580 sec
       Succeeded registration: 1
       Attempted registration: 3
       Last rekey from       : 0.0.0.0
       Last rekey seq num    : 0
       Unicast rekey received: 0
       Rekey ACKs sent       : 0
       Rekey Received        : never
       DP Error Monitoring   : OFF
       IPSEC init reg executed    : 0
       IPSEC init reg postponed   : 0
       Active TEK Number     : 1
       SA Track (OID/status) : disabled

       allowable rekey cipher: any
       allowable rekey hash  : any
       allowable transformtag: any ESP

    Rekeys cumulative
       Total received        : 0
       After latest register : 0
       Rekey Acks sents      : 0

 ACL Downloaded From KS 192.168.23.1:
   access-list   permit ip 192.168.12.0 0.0.0.255 192.168.34.0 0.0.0.255
   access-list   permit ip 192.168.34.0 0.0.0.255 192.168.12.0 0.0.0.255

KEK POLICY:
    Rekey Transport Type     : Unicast
    Lifetime (secs)          : 84613
    Encrypt Algorithm        : 3DES
    Key Size                 : 192     
    Sig Hash Algorithm       : HMAC_AUTH_SHA
    Sig Key Length (bits)    : 2352    

TEK POLICY for the current KS-Policy ACEs Downloaded:
  GigabitEthernet1.10:
    IPsec SA:
        spi: 0xA3D6592E(2748733742)
        KGS: Disabled
        transform: esp-aes esp-sha256-hmac 
        sa timing:remaining key lifetime (sec): (1815)
        Anti-Replay(Counter Based) : 64
        tag method : disabled
        alg key size: 16 (bytes)
        sig key size: 32 (bytes)
        encaps: ENCAPS_TUNNEL

Among the most interesting is the KEK policy and the ACL thats in place.

If we then verify from Client-1, we can see that we have a couple of seconds timeout while the encryption is being setup, and from there we have connectivity:

Client-1#ping 192.168.34.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.34.1, timeout is 2 seconds:
..!!!
Success rate is 60 percent (3/5), round-trip min/avg/max = 2/2/2 ms

So from something thats in theory very complex, this is very efficient from a both a configuration as well as a control-plane point of view. I know it certainly helped me understand the steps involved in setting GETVPN to create this lab, so I hope its been relevant for you as well!

MPLS VPN’s over mGRE

This blog post outlines what “MPLS VPNs over mGRE” is all about as well as provide an example of such a configuration.

So what is “MPLS VPNs over mGRE”? – Well, basically its taking regular MPLS VPN’s and using it over an IP only core network. Since VPN’s over MPLS is one of the primary drivers for implementing an MPLS network in the first place, using the same functionality over an IP-only core might be very compelling for some not willing/able to run MPLS label switching in the core.

Instead of using labels to switch the traffic from one PE to another, mGRE (Multipoint GRE) is used as the encapsulation technology instead.

Be advised that 1 label is still being used however. This is the VPN label that’s used to identify which VRF interface to switch the traffic to when its received by a PE. This label is, just as in regular MPLS VPN’s, assigned by the PE through MP-BGP.

So how is this actually performed? – Well, lets take a look at an example.

The topology I will be using is as follows:

Topology for MPLS VPN's over mGRE

Topology for MPLS VPN’s over mGRE

** Note: I ran into an issue with VIRL, causing my CSR-3 to R3 to fail when establishing EIGRP adjacency. So i will not be using this in the examples to come. I noted this behavior on the VIRL community forums in case you are interested.

In this topology we have a core network, consisting of CSR-1 to CSR-5. They are all running OSPF in area 0. No MPLS is configured, so its pure IP routing end-to-end.

Lets take a look at CSR-5’s RIB:

CSR-5#sh ip route | beg Gateway
Gateway of last resort is not set

      1.0.0.0/32 is subnetted, 1 subnets
O        1.1.1.1 [110/2] via 192.168.15.1, 00:39:00, GigabitEthernet2
      2.0.0.0/32 is subnetted, 1 subnets
O        2.2.2.2 [110/2] via 192.168.25.2, 00:38:50, GigabitEthernet3
      3.0.0.0/32 is subnetted, 1 subnets
O        3.3.3.3 [110/2] via 192.168.35.3, 00:38:50, GigabitEthernet4
      4.0.0.0/32 is subnetted, 1 subnets
O        4.4.4.4 [110/2] via 192.168.45.4, 00:39:10, GigabitEthernet5
      5.0.0.0/32 is subnetted, 1 subnets
C        5.5.5.5 is directly connected, Loopback0
      192.168.15.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.15.0/24 is directly connected, GigabitEthernet2
L        192.168.15.5/32 is directly connected, GigabitEthernet2
      192.168.25.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.25.0/24 is directly connected, GigabitEthernet3
L        192.168.25.5/32 is directly connected, GigabitEthernet3
      192.168.35.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.35.0/24 is directly connected, GigabitEthernet4
L        192.168.35.5/32 is directly connected, GigabitEthernet4
      192.168.45.0/24 is variably subnetted, 2 subnets, 2 masks
C        192.168.45.0/24 is directly connected, GigabitEthernet5
L        192.168.45.5/32 is directly connected, GigabitEthernet5

And to verify that we are not running any MPLS switching:

CSR-5#sh mpls for
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface

So we have our connected interfaces along with the loopbacks of all the routers in the core network.

Lets take a look at CSR-1’s configuration, with regards to its VRF configuration in particular:

vrf definition CUST-A
 rd 100:100
 !
 address-family ipv4
  route-target export 100:100
  route-target import 100:100
 exit-address-family
!
!
interface GigabitEthernet2
 vrf forwarding CUST-A
 ip address 10.0.1.1 255.255.255.0
 negotiation auto
!
router eigrp 1
!
address-family ipv4 vrf CUST-A autonomous-system 100
  redistribute bgp 1 metric 1 1 1 1 1
  network 0.0.0.0
 exit-address-family

We have our VRF CUST-A configured, with a RD of 100:100 along with 100:100 as both import and export Route-Targets. Just as we would configure for a regular MPLS L3 VPN.

We use our GigabithEthernet2 interface as our attachment circuit to our CUST-A. In addition we have EIGRP 100 running as the VRF aware IGP towards R1. And finally we are redistributing BGP into the VRF.

Lets make sure we are receiving routes from R1 into the VRF RIB:

CSR-1#sh ip route vrf CUST-A eigrp | beg Gateway
Gateway of last resort is not set

      100.0.0.0/32 is subnetted, 3 subnets
D        100.100.100.1 [90/130816] via 10.0.1.100, 00:45:37, GigabitEthernet2

Looks good, we are receiving the loopback prefix from R1. This is as we would expect.

A similar configuration exists on CSR-2, CSR-3 and CSR-4. Nothing different from a regular MPLS L3 VPN service.

Now for the core configuration utilizing MP-BGP.
We are using CSR-5 as a VPN-v4 route-reflector in order to avoid having a full mesh of iBGP sessions.

So the configuration on R5 looks like this:

CSR-5#sh run | sec router bgp
router bgp 1
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 1.1.1.1 remote-as 1
 neighbor 1.1.1.1 update-source Loopback0
 neighbor 2.2.2.2 remote-as 1
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 3.3.3.3 remote-as 1
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 4.4.4.4 remote-as 1
 neighbor 4.4.4.4 update-source Loopback0
 !
 address-family ipv4
 exit-address-family
 !
 address-family vpnv4
  neighbor 1.1.1.1 activate
  neighbor 1.1.1.1 send-community extended
  neighbor 1.1.1.1 route-reflector-client
  neighbor 2.2.2.2 activate
  neighbor 2.2.2.2 send-community extended
  neighbor 2.2.2.2 route-reflector-client
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-community extended
  neighbor 3.3.3.3 route-reflector-client
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 send-community extended
  neighbor 4.4.4.4 route-reflector-client
 exit-address-family

Pretty straightforward really.

Then on CSR-1:

 CSR-1#sh run | sec router bgp
 router bgp 1
  bgp log-neighbor-changes
  no bgp default ipv4-unicast
  neighbor 5.5.5.5 remote-as 1
  neighbor 5.5.5.5 update-source Loopback0
  !
  address-family ipv4
  exit-address-family
  !
  address-family vpnv4
   neighbor 5.5.5.5 activate
   neighbor 5.5.5.5 send-community extended
   neighbor 5.5.5.5 route-map MODIFY-INBOUND in
  exit-address-family
  !
  address-family ipv4 vrf CUST-A
   redistribute eigrp 100
  exit-address-family

Here we have a single neighbor configured (R5 being the RR) using our loopback address. We are also redistributing routes from the VRF into BGP for VPNv4 announcements to the other PE’s. Whats really important (and differs from regular MPLS L3 VPN’s) is the route-map we apply inbound (MODIFY-INBOUND). Lets take a closer look at that:

CSR-1#sh route-map
route-map MODIFY-INBOUND, permit, sequence 10
  Match clauses:
  Set clauses:
    ip next-hop encapsulate l3vpn L3VPN-PROFILE
  Policy routing matches: 0 packets, 0 bytes

So all this does is set the next-hop according to a l3vpn profile called L3VPN-PROFILE. Now this is really the heart of the technology. Lets look at the profile in more detail:

CSR-1#sh run | beg L3VPN
l3vpn encapsulation ip L3VPN-PROFILE
 !

Well, that wasnt very informative. It simply defines a standard profile (which means mGRE) with our desired name.
You can get more detail by using the show commands:

CSR-1#sh l3vpn encapsulation ip 

 Profile: L3VPN-PROFILE
  transport ipv4 source Auto: Loopback0
  protocol gre
  payload mpls
   mtu default
  Tunnel Tunnel0 Created [OK]
  Tunnel Linestate [OK]
  Tunnel Transport Source (Auto) Loopback0 [OK]

So this tells us, that by default Loopback0 was chosen as the source of the tunnel and that Tunnel0 was created automatically. So lets take a look at the Tunnel0 in more detail:

 CSR-1#sh interface Tunnel0
 Tunnel0 is up, line protocol is up
   Hardware is Tunnel
   Interface is unnumbered. Using address of Loopback0 (1.1.1.1)
   MTU 9976 bytes, BW 10000 Kbit/sec, DLY 50000 usec,
      reliability 255/255, txload 1/255, rxload 1/255
   Encapsulation TUNNEL, loopback not set
   Keepalive not set
   Tunnel linestate evaluation up
   Tunnel source 1.1.1.1 (Loopback0)
    Tunnel Subblocks:
       src-track:
          Tunnel0 source tracking subblock associated with Loopback0
           Set of tunnels with source Loopback0, 1 member (includes iterators), on interface <OK>
   Tunnel protocol/transport multi-GRE/IP
     Key disabled, sequencing disabled
     Checksumming of packets disabled
   Tunnel TTL 255, Fast tunneling enabled
   Tunnel transport MTU 1476 bytes
   Tunnel transmit bandwidth 8000 (kbps)
   Tunnel receive bandwidth 8000 (kbps)
   Last input never, output never, output hang never
   Last clearing of "show interface" counters 00:54:16
   Input queue: 0/375/0/0 (size/max/drops/flushes); Total output drops: 3
   Queueing strategy: fifo
   Output queue: 0/0 (size/max)
   5 minute input rate 0 bits/sec, 0 packets/sec
   5 minute output rate 0 bits/sec, 0 packets/sec
      0 packets input, 0 bytes, 0 no buffer
      Received 0 broadcasts (0 IP multicasts)
      0 runts, 0 giants, 0 throttles
      0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
      0 packets output, 0 bytes, 0 underruns
      0 output errors, 0 collisions, 0 interface resets
      0 unknown protocol drops
      0 output buffer failures, 0 output buffers swapped out

Whats important here is that the Tunnel protocol/transport is multi-GRE/IP, which is the whole point of it all.

So to recap, when we receive prefixes reflected by our RR (this is besides the point, it could just as well be a full mesh), we set our IP Next-Hop to the other PE’s loopback address and tell the router to do the mGRE encapsulation when traffic is to be routed to these prefixes.

Lets take a look at our BGP table on CSR-1:

CSR-1#sh bgp vpnv4 uni vrf CUST-A | beg Network
     Network          Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 100:100 (default for vrf CUST-A)
 *>  10.0.1.0/24      0.0.0.0                  0         32768 ?
 *>i 10.0.2.0/24      2.2.2.2                  0    100      0 ?
 *>i 10.0.3.0/24      3.3.3.3                  0    100      0 ?
 *>i 10.0.4.0/24      4.4.4.4                  0    100      0 ?
 *>  100.100.100.1/32 10.0.1.100          130816         32768 ?
 *>i 100.100.100.2/32 2.2.2.2             130816    100      0 ?
 *>i 100.100.100.4/32 4.4.4.4             130816    100      0 ?

(**Note: Remember CSR-3 is broken because of VIRL)

Lets take a look at what information is present for 100.100.100.2/32:

 CSR-1#sh bgp vpnv4 uni vrf CUST-A 100.100.100.2/32
 BGP routing table entry for 100:100:100.100.100.2/32, version 19
 Paths: (1 available, best #1, table CUST-A)
   Not advertised to any peer
   Refresh Epoch 1
   Local
     2.2.2.2 (metric 3) (via default) (via Tunnel0) from 5.5.5.5 (5.5.5.5)
       Origin incomplete, metric 130816, localpref 100, valid, internal, best
       Extended Community: RT:100:100 Cost:pre-bestpath:128:130816
         0x8800:32768:0 0x8801:100:128256 0x8802:65281:2560 0x8803:65281:1500
         0x8806:0:1684300802
       Originator: 2.2.2.2, Cluster list: 5.5.5.5
       mpls labels in/out nolabel/17
       rx pathid: 0, tx pathid: 0x0

Important to note here is that we are being told to use label nr. 17 as the VPN label for this prefix when sending it to 2.2.2.2 (CSR-2).

And finally lets take a look at what CEF thinks about it all:

CSR-1#sh ip cef vrf CUST-A 100.100.100.2 detail
100.100.100.2/32, epoch 0, flags [rib defined all labels]
  nexthop 2.2.2.2 Tunnel0 label 17

So CEF will assign label 17 to the packet and then use Tunnel0 to reach CSR-2. Just as we would expect.

As a final verification ive done an Embedded Packet Capture on CSR-5 while doing a ping from R1’s loopback to R2’s loopback and this is what you can see here:

 6  142   29.028990   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800000 0000FF2F B5490101 01010202   ......./.I......
  0020:  02020000 88470001 11FE4500 00640000   .....G....E..d..
  0030:  0000FE01 2BCD6464 64016464 64020800   ....+.ddd.ddd...

   7  142   29.106989   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800001 0000FF2F B5480101 01010202   ......./.H......
  0020:  02020000 88470001 11FE4500 00640001   .....G....E..d..
  0030:  0000FE01 2BCC6464 64016464 64020800   ....+.ddd.ddd...

   8  142   29.184988   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800002 0000FF2F B5470101 01010202   ......./.G......
  0020:  02020000 88470001 11FE4500 00640002   .....G....E..d..
  0030:  0000FE01 2BCB6464 64016464 64020800   ....+.ddd.ddd...

   9  142   29.241037   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800003 0000FF2F B5460101 01010202   ......./.F......
  0020:  02020000 88470001 11FE4500 00640003   .....G....E..d..
  0030:  0000FE01 2BCA6464 64016464 64020800   ....+.ddd.ddd...

  10  142   29.287024   1.1.1.1          ->  2.2.2.2          GRE
  0000:  FA163E39 EC39FA16 3ECEC705 08004500   ..>9.9..>.....E.
  0010:  00800004 0000FF2F B5450101 01010202   ......./.E......
  0020:  02020000 88470001 11FE4500 00640004   .....G....E..d..
  0030:  0000FE01 2BC96464 64016464 64020800   ....+.ddd.ddd...

As you can see, the encapsulation is GRE, just as expected.

So thats all there is to this technology. Very useful if you have an IP-only core network.

I hope its been useful and i will soon attach all the configurations for the routers in case you want to take a closer look.

Thanks for reading!

Update: Link to configs here

Unified/Seamless MPLS

In this post I would like to highlight a relative new (to me) application of MPLS called Unified MPLS.
The goal of Unified MPLS is to separate your network into individual segments of IGP’s in order to keep your core network as simple as possible while still maintaining an end-to-end LSP for regular MPLS applications such as L3 VPN’s.

What we are doing is simply to put Route Reflectors into the forwarding path and changing the next-hop’s along the way, essentially stiching together the final LSP.
Along with that we are using BGP to signal a label value to maintain the LSP from one end of the network to the other without the use of LDP between IGP’s.

Take a look at the topology that we will be using to demonstrate this feature:

Unified-MPLS-Topology

In this topology we have a simplified layout of a service provider. We have a core network consisting of R3, R4 and R5 along with distribution networks on the right and left of the core. R2 and R3 is in the left distribution and R5 and R6 is in the right hand side one.

We have an MPLS L3VPN customer connected consisting of R1 in one site and R7 in another.

As is visisible in the topology, we are running 3 separate IGP’s to make a point about this feature. EIGRP AS 1, OSPF 100 and EIGRP AS 2. However we are only running one autonomous system as seen from BGP, so its a pure iBGP network.

Now in order to make the L3VPN to work, we need to have an end-to-end LSP going from R2 all the way to R6.
Whats is key here is that in order to have end-to-end reachability, we have contained IGP areas, each of which is running LDP for labels. However between the areas, all we are doing is leaking a couple of loopback adresses into the distribution sections from the core. These are used exclusively for the iBGP session.

On top of that, we need to have R3 and R5 being route-reflectors, have them being in the data path as well as having them allocating labels. This is done through the “send-label” command along with modifying the next-hop (“next-hop-self all” command).

This is illustrated in the following:

Unified-MPLS-iBGP-Topology

Enough theory, lets take a look at the configuration nessecary to pull this of. Lets start out with R2’s IGP and LDP configuration:

R2#sh run | sec router eigrp
router eigrp 1
 network 2.0.0.0
 network 10.0.0.0
 passive-interface default
 no passive-interface GigabitEthernet3

R2#sh run int g3
interface GigabitEthernet3
 ip address 10.2.3.2 255.255.255.0
 negotiation auto
 mpls ip
end

Pretty vanilla configuration of IGP + LDP.

The same for R3:

R3#sh run | sec router eigrp 1
router eigrp 1
 network 10.0.0.0
 redistribute ospf 100 metric 1 1 1 1 1 route-map REDIST-LOOPBACK-MAP
 passive-interface default
 no passive-interface GigabitEthernet2

R3#sh run int g2
interface GigabitEthernet2
 ip address 10.2.3.3 255.255.255.0
 negotiation auto
 mpls ip
end

R3#sh route-map REDIST-LOOPBACK-MAP
route-map REDIST-LOOPBACK-MAP, permit, sequence 10
  Match clauses:
    ip address prefix-lists: REDIST-LOOPBACK-PREFIX-LIST
  Set clauses:
  Policy routing matches: 0 packets, 0 bytes

R3#sh ip prefix-list
ip prefix-list REDIST-LOOPBACK-PREFIX-LIST: 1 entries
   seq 5 permit 3.3.3.3/32

Apart from the redistribution part, its simply establishing an EIGRP adjacency with R2. On top of that we are redistributing R3’s loopback0 interface, which is in the Core area, into EIGRP. Again, this step is nessecary for the iBGP session establishment.

An almost identical setup is present in the other distribution site, consisting of R5 and R6. Again we redistribute R5’s loopback0 address into the IGP (EIGRP AS 2), so we can have iBGP connectivity, which is our next step.

So lets take a look at the BGP configuration on R2 all the way to R6. Im leaving out the VPNv4 configuration for now, in order to make it more visible what we are trying to accomplish first:

R2:
---
router bgp 1000
 bgp router-id 2.2.2.2
 bgp log-neighbor-changes
 neighbor 3.3.3.3 remote-as 1000
 neighbor 3.3.3.3 update-source Loopback0
 !
 address-family ipv4
  network 2.2.2.2 mask 255.255.255.255
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-label

R3:
---
router bgp 1000
 bgp router-id 3.3.3.3
 bgp log-neighbor-changes
 neighbor 2.2.2.2 remote-as 1000
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 2.2.2.2 route-reflector-client
 neighbor 2.2.2.2 next-hop-self all
 neighbor 2.2.2.2 send-label
 neighbor 5.5.5.5 remote-as 1000
 neighbor 5.5.5.5 update-source Loopback0
 neighbor 5.5.5.5 route-reflector-client
 neighbor 5.5.5.5 next-hop-self all
 neighbor 5.5.5.5 send-label

R5:
---
router bgp 1000
 bgp router-id 5.5.5.5
 bgp log-neighbor-changes
 neighbor 3.3.3.3 remote-as 1000
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 3.3.3.3 route-reflector-client
 neighbor 3.3.3.3 next-hop-self all
 neighbor 3.3.3.3 send-label
 neighbor 6.6.6.6 remote-as 1000
 neighbor 6.6.6.6 update-source Loopback0
 neighbor 6.6.6.6 route-reflector-client
 neighbor 6.6.6.6 next-hop-self all
 neighbor 6.6.6.6 send-label

R6:
---
router bgp 1000
 bgp router-id 6.6.6.6
 bgp log-neighbor-changes
 neighbor 5.5.5.5 remote-as 1000
 neighbor 5.5.5.5 update-source Loopback0
 !
 address-family ipv4
  network 6.6.6.6 mask 255.255.255.255
  neighbor 5.5.5.5 activate
  neighbor 5.5.5.5 send-label

As visible from the configuration. We have 2 IPv4 route-reflectors (R3 and R5), both of which put themselves into the datapath by using the next-hop-self command. On top of that we are allocating labels for all prefixes via BGP as well. Lets verify this on the set:

R2#sh bgp ipv4 uni la
   Network          Next Hop      In label/Out label
   2.2.2.2/32       0.0.0.0         imp-null/nolabel
   6.6.6.6/32       3.3.3.3         nolabel/305

R3#sh bgp ipv4 uni la
   Network          Next Hop      In label/Out label
   2.2.2.2/32       2.2.2.2         300/imp-null
   6.6.6.6/32       5.5.5.5         305/500

R5#sh bgp ipv4 uni la
   Network          Next Hop      In label/Out label
   2.2.2.2/32       3.3.3.3         505/300
   6.6.6.6/32       6.6.6.6         500/imp-null

 R6#sh bgp ipv4 uni la
    Network          Next Hop      In label/Out label
    2.2.2.2/32       5.5.5.5         nolabel/505
    6.6.6.6/32       0.0.0.0         imp-null/nolabel

Since we are only injecting 2 prefixes (loopbacks of R2 and R6) into BGP, thats all we have allocated labels for.

Doing a traceroute from R2 to R6 (between loopbacks), will reveal if we truly have an LSP between them:

R2#traceroute 6.6.6.6 so loo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 10.2.3.3 [MPLS: Label 305 Exp 0] 26 msec 15 msec 18 msec
  2 10.3.4.4 [MPLS: Labels 401/500 Exp 0] 10 msec 24 msec 34 msec
  3 10.4.5.5 [MPLS: Label 500 Exp 0] 7 msec 23 msec 24 msec
  4 10.5.6.6 20 msec *  16 msec

This looks exactly like we wanted it to. (note that the 401 label is on a pure P router in the core).
This also means we can setup our VPNv4 configuration on R2 and R6:

R2#sh run | sec router bgp
router bgp 1000
 bgp router-id 2.2.2.2
 bgp log-neighbor-changes
 neighbor 3.3.3.3 remote-as 1000
 neighbor 3.3.3.3 update-source Loopback0
 neighbor 6.6.6.6 remote-as 1000
 neighbor 6.6.6.6 update-source Loopback0
 !
 address-family ipv4
  network 2.2.2.2 mask 255.255.255.255
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-label
  no neighbor 6.6.6.6 activate
 exit-address-family
 !
 address-family vpnv4
  neighbor 6.6.6.6 activate
  neighbor 6.6.6.6 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf CUSTOMER-A
  redistribute connected
  redistribute static
 exit-address-family
R2#

R6#sh run | sec router bgp
router bgp 1000
 bgp router-id 6.6.6.6
 bgp log-neighbor-changes
 neighbor 2.2.2.2 remote-as 1000
 neighbor 2.2.2.2 update-source Loopback0
 neighbor 5.5.5.5 remote-as 1000
 neighbor 5.5.5.5 update-source Loopback0
 !
 address-family ipv4
  network 6.6.6.6 mask 255.255.255.255
  no neighbor 2.2.2.2 activate
  neighbor 5.5.5.5 activate
  neighbor 5.5.5.5 send-label
 exit-address-family
 !
 address-family vpnv4
  neighbor 2.2.2.2 activate
  neighbor 2.2.2.2 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf CUSTOMER-A
  redistribute connected
  redistribute static
 exit-address-family

Lets verify that the iBGP VPNv4 peering is up and running:

R2#sh bgp vpnv4 uni all sum
..
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
6.6.6.6         4         1000      16      16       11    0    0 00:09:31        2

R6#sh bgp vpnv4 uni all sum
..
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2.2.2.2         4         1000      17      17       11    0    0 00:10:26        2

We do have the prefixes and we should also have reachability from R1 to R7 (by way of their individual static default routes):

R1#ping 7.7.7.7 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 7.7.7.7, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 17/27/54 ms

Looks good, lets check the label path:

R1#traceroute 7.7.7.7 so loo0
Type escape sequence to abort.
Tracing the route to 7.7.7.7
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.2.2 19 msec 13 msec 12 msec
  2 10.2.3.3 [MPLS: Labels 305/600 Exp 0] 18 msec 19 msec 15 msec
  3 10.3.4.4 [MPLS: Labels 401/500/600 Exp 0] 12 msec 32 msec 34 msec
  4 10.4.5.5 [MPLS: Labels 500/600 Exp 0] 20 msec 27 msec 27 msec
  5 10.6.7.6 [MPLS: Label 600 Exp 0] 23 msec 15 msec 13 msec
  6 10.6.7.7 25 msec *  16 msec

What we are seeing here is basically the same path, but with the “VPN” label first (label 600).

So what have we really accomplished here? – Well, lets take a look at the RIB on R2 and look for the IGP (EIGRP AS 1) routes:

R2#sh ip route eigrp
..
      3.0.0.0/32 is subnetted, 1 subnets
D EX     3.3.3.3 [170/2560000512] via 10.2.3.3, 00:16:02, GigabitEthernet3
      10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
D        10.3.4.0/24 [90/3072] via 10.2.3.3, 00:16:02, GigabitEthernet3

A very small table indeed. And if we include whats being learned by BGP:

R2#sh ip route bgp
..
      6.0.0.0/32 is subnetted, 1 subnets
B        6.6.6.6 [200/0] via 3.3.3.3, 00:17:02

R2#sh ip route 6.6.6.6
Routing entry for 6.6.6.6/32
  Known via "bgp 1000", distance 200, metric 0, type internal
  Last update from 3.3.3.3 00:17:43 ago
  Routing Descriptor Blocks:
  * 3.3.3.3, from 3.3.3.3, 00:17:43 ago
      Route metric is 0, traffic share count is 1
      AS Hops 0
      MPLS label: 305

Only 1 prefix to communicate with the remote distribution site’s PE router (which we need the label for).

This means you can scale your distribution sites to very large sizes, keep your core as effecient as possible and eliminate using areas and whatnot in your IGP’s.

I hope its been useful with this quick walkthrough of unified/seamless MPLS.

EIGRP OTP example

In this post id like to provide an example of a fairly new development to EIGRP which is called EIGRP Over The Top (OTP).

In all its simplicity it establish an EIGRP multihop adjacency using LISP as the encapsulation method for transport through the WAN network.

One of the applications of this would be to avoid relying on the SP in an MPLS L3 VPN. You could simply use the L3 VPN for transport between the interfaces directly connected to the Service Provider and run your own adjacency directly between your CPE routers (without the use of a GRE tunnel, which would be another method to do it)

The topology used for this example consists of 4 routers. All 4 of the routers are using OSPF to provide connectivity (you could take this example and do a L3 VPN using MPLS as an exercise). Im simply taking the lazy path and doing it this way 🙂

EIGRP-OTP-Topology

EIGRP-OTP-Topology

R1 and R4 are running EIGRP in a named process “test”. This process is in Autonomous system 100 and the Loopback 0 interfaces are advertised into the V4 address-family.

Lets verify that we have connectivity between R1’s g1.102 interface and R4’s g1.304 interface:

R1#ping 172.3.4.4 so g1.102
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.3.4.4, timeout is 2 seconds:
Packet sent with a source address of 172.1.2.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/5/19 ms

All looks good.

Now lets take a look at the configuration that ties the R1 and R4 together with an EIGRP adjacency:

on R1:

R1#sh run | sec router eigrp
router eigrp test
 !
 address-family ipv4 unicast autonomous-system 100
  !
  topology base
  exit-af-topology
  neighbor 172.3.4.4 GigabitEthernet1.102 remote 10 lisp-encap 1
  network 1.0.0.0
  network 172.1.0.0
 exit-address-family

Whats important here is line 8 and 10.

In line 8, we specify that we have a remote neighbor (172.3.4.4), which can be reached through g1.102. The maximum number of hops to reach this neighbor is 10 and we should use lisp encapsulation with an ID of 1.

Also in line 10, its important to add the outgoing interface into the EIGRP process. I’ve found that without doing this, the adjacency wont come up. Its not enough to specify the interface in the neighbor command.

Lets verify which interfaces we are running EIGRP on at R1:

R1#sh ip eigrp interfaces
EIGRP-IPv4 VR(test) Address-Family Interfaces for AS(100)
                              Xmit Queue   PeerQ        Mean   Pacing Time   Multicast    Pending
Interface              Peers  Un/Reliable  Un/Reliable  SRTT   Un/Reliable   Flow Timer   Routes
Lo0                      0        0/0       0/0           0       0/0            0           0
Gi1.102                  1        0/0       0/0           1       0/0           50           0

On the reverse path, on R4:

R4#sh run | sec router eigrp
router eigrp test
 !
 address-family ipv4 unicast autonomous-system 100
  !
  topology base
  exit-af-topology
  neighbor 172.1.2.1 GigabitEthernet1.304 remote 10 lisp-encap 1
  network 4.0.0.0
  network 172.3.0.0
 exit-address-family

Same deal, just in the opposite direction.

Thats about it, lets take a look if we have the desired adjacency up and running:

R1#sh ip ei nei
EIGRP-IPv4 VR(test) Address-Family Neighbors for AS(100)
H   Address                 Interface              Hold Uptime   SRTT   RTO  Q  Seq
                                                   (sec)         (ms)       Cnt Num
0   172.3.4.4               Gi1.102                  12 01:14:16    1   100  0  3

Excellent! and the routing tables:

R1#sh ip route eigrp | beg Gateway
Gateway of last resort is not set

      4.0.0.0/32 is subnetted, 1 subnets
D        4.4.4.4 [90/93994331] via 172.3.4.4, 01:14:50, LISP1

Pay attention to the fact that LISP1 is used as the outgoing interface.

And finally the data plane verification:

R1#ping 4.4.4.4 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 4.4.4.4, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/10/25 ms

Great! – and thats about all there is to a simple EIGRP OTP scenario. (Look into EIGRP “Route Reflectors” if you want more information on hub-and-spoke topologies).

Take care!

Trying out IPv6 Prefix Delegation

In this post i will show how and why to use a feature called IPv6 Prefix Delegation (PD).

IPv6 prefix delegation is a feature that provides the capability to delegate or hand out IPv6 prefixes to other routers without the need to hardcode these prefixes into the routers.

Why would you want to do this? – Well, for one is the administration overhead associated with manual configuration. If the end-customer only cares about the amount of prefixes he or she receives, then it might as well be handed out automatically from a preconfigure pool. Just like DHCP works today on end-user systems.

On top of that, by configuring a redistribution into BGP just once, you will automatically have reachability to the prefixes that has been handed out, from the rest of your SP network.

So how do you go about configuring this? – Well, lets take a look at the topology we’ll be using to demonstrate IPv6 Prefix Delegation.

PD-Post-Topology

First off, we have the SP core network which consists of R1, R2 and R3. They are running in AS 64512 with R1 being a BGP route-reflector for the IPv6 unicast address-family. As an IGP we are running OSPFv3 to provide reachability within the core. No IPv4 is configured on any device.

The SP has been allocated a /32 IPv6 prefix which is 2001:1111::/32, from which it will “carve” out IPv6 prefixes to both its internal network as well as customer networks.

We are using /125 for the links between the core routers, just to make it simple when looking at the routing tables and the topology.

R2 is really where all the magic is taking place. R2 is a PE for two customers, Customer A and Customer B. Customer A is being reached through Gigabit2 and Customer B through Gigabit3. The customer’s respective CE routers are R4 and R7.

There is a link-net between R2 and R4 as well as R2 and R7. These are respectively 2001:1111:101::/64 and 2001:1111:102::/64.

So Lab-ISP has decided to use a /48 network from which to hand out prefixes to its customers. This /48 is 2001:1111:2222::/48. Lab-ISP also decided to hand out /56 addresses which will give the customers 8 bits (from 56 to 64) to use for subnetting. This is a typical deployment.

Also, since we are using a /48 as the block to “carve” out from, this gives us 8 bits (from 48 to 56) of assignable subnets, which ofcourse equals to 256 /56 prefixes we can hand out.

All of this can be a bit confusing, so lets look at it from a different perspective.

We start out with 2001:1111:2222::/48. We then want to look at how the first /56 looks like:

The 2001:1111:2222:0000::/56 is
2001:1111:2222:0000::
until
2001:1111:2222:00FF::

That last byte (remember this is all in hex) is what gives the customer 256 subnets to play around with.

The next /56 is:
2001:1111:2222:0100::/56

2001:1111:2222:0100::
until
2001:1111:2222:01FF::

We can do this all in all 256 times as mentioned earlier.

So in summary, with two customers, each receiving a /56 prefix, we would expect to see the bindings show up on R2 as:

2001:1111:2222::/56
2001:1111:2222:100::/56

So with all this theory in place, lets take a look at the configuration that makes all this work out.

First off we start out with creating a local IPv6 pool on R2:

ipv6 local pool IPv6-Local-Pool 2001:1111:2222::/48 56

This is in accordance to the requirements we have stated earlier.

Next up, we tie this local pool into a global IPv6 pool used specifically for Prefix Delegation:

ipv6 dhcp pool PD-DHCP-POOL
 prefix-delegation pool IPv6-Local-Pool

Finally we attach the IPv6 DHCP pool to the interfaces of Customer A and Customer B:

R2#sh run int g2
Building configuration...

Current configuration : 132 bytes
!
interface GigabitEthernet2
 no ip address
 negotiation auto
 ipv6 address 2001:1111:101::2/64
 ipv6 dhcp server PD-DHCP-POOL
end

R2#sh run int g3
Building configuration...

Current configuration : 132 bytes
!
interface GigabitEthernet3
 no ip address
 negotiation auto
 ipv6 address 2001:1111:102::2/64
 ipv6 dhcp server PD-DHCP-POOL
end

Thats pretty much all thats required from the SP point of view in order to hand out the prefixes.

Now, lets take a look at whats required on the CE routers.

Starting off with R4’s interface to the SP:

R4#sh run int g2
Building configuration...

Current configuration : 156 bytes
!
interface GigabitEthernet2
 no ip address
 negotiation auto
 ipv6 address 2001:1111:101::3/64
 ipv6 address autoconfig
 ipv6 dhcp client pd LOCAL-CE
end

Note that the “LOCAL-CE” is a local label we will use for the next step. It can be anything you desire.

Only when the “inside” interfaces requests an IPv6 address will a request be sent to the SP for them to hand something out. This is done on R4’s g1.405 and g1.406 interfaces:

R4#sh run int g1.405
Building configuration...

Current configuration : 126 bytes
!
interface GigabitEthernet1.405
 encapsulation dot1Q 405
 ipv6 address LOCAL-CE ::1:0:0:0:1/64
 ipv6 address autoconfig
end

R4#sh run int g1.406
Building configuration...

Current configuration : 126 bytes
!
interface GigabitEthernet1.406
 encapsulation dot1Q 406
 ipv6 address LOCAL-CE ::2:0:0:0:1/64
 ipv6 address autoconfig
end

Here we reference the previous local label “LOCAL-CE”. Most interesting is the fact that we are now subnetting the /56 prefix we have received by doing the “::1:0:0:0:1/64” and “::2:0:0:0:1/64” respectively.

What this does is that it appends the address to whats being given out. To repeat, for Customer A, this is 2001:1111:2222::/56 which will then be a final address of: 2001:1111:2222:1:0:0:0:1/64 for interface g1.405 and 2001:1111:2222:2:0:0:0:1/64 for g1.406.

Lets turn our attention to Customer B on R7.

Same thing has been configured, just using a different “label” for the assigned pool to show that its arbitrary:

R7#sh run int g3
Building configuration...

Current configuration : 155 bytes
!
interface GigabitEthernet3
 no ip address
 negotiation auto
 ipv6 address 2001:1111:102::7/64
 ipv6 address autoconfig
 ipv6 dhcp client pd CE-POOL
end

And the inside interface g1.100:

R7#sh run int g1.100
Building configuration...

Current configuration : 100 bytes
!
interface GigabitEthernet1.100
 encapsulation dot1Q 100
 ipv6 address CE-POOL ::1:0:0:0:7/64
end

Again, we are subnetting the received /56 into a /64 and applying it on the inside interface.

Going back to the SP point of view, lets verify that we are handing out some prefixes:

R2#sh ipv6 local pool
Pool                  Prefix                                       Free  In use
IPv6-Local-Pool       2001:1111:2222::/48                            254      2

We can see that our local pool has handed out 2 prefixes and if we dig further down into the bindings:

R2#sh ipv6 dhcp binding
Client: FE80::250:56FF:FEBE:93CC
  DUID: 00030001001EF6767600
  Username : unassigned
  VRF : default
  Interface : GigabitEthernet3
  IA PD: IA ID 0x00080001, T1 302400, T2 483840
    Prefix: 2001:1111:2222:100::/56
            preferred lifetime 604800, valid lifetime 2592000
            expires at Oct 16 2014 03:11 PM (2416581 seconds)
Client: FE80::250:56FF:FEBE:4754
  DUID: 00030001001EE5DF8700
  Username : unassigned
  VRF : default
  Interface : GigabitEthernet2
  IA PD: IA ID 0x00070001, T1 302400, T2 483840
    Prefix: 2001:1111:2222::/56
            preferred lifetime 604800, valid lifetime 2592000
            expires at Oct 16 2014 03:11 PM (2416575 seconds)

We see that we do indeed have some bindings taking place. Whats of more interest though, is the fact that static routes have been created:

R2#sh ipv6 route static | beg a - Ap
       a - Application
S   2001:1111:2222::/56 [1/0]
     via FE80::250:56FF:FEBE:4754, GigabitEthernet2
S   2001:1111:2222:100::/56 [1/0]
     via FE80::250:56FF:FEBE:93CC, GigabitEthernet3

So two static routes that points to the CE routers. This makes it extremely simple to propagate further into the SP core:

R2#sh run | sec router bgp
router bgp 64512
 bgp router-id 2.2.2.2
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 2001:1111::12:1 remote-as 64512
 !
 address-family ipv4
 exit-address-family
 !
 address-family ipv6
  redistribute static
  neighbor 2001:1111::12:1 activate
 exit-address-family

Ofcourse some sort of filtering should be used instead of just redistributing every static route on the PE, but you get the point. So lets check it out on R3 for example:

R3#sh bgp ipv6 uni | beg Network
     Network          Next Hop            Metric LocPrf Weight Path
 *>i 2001:1111:2222::/56
                       2001:1111::12:2          0    100      0 ?
 *>i 2001:1111:2222:100::/56
                       2001:1111::12:2          0    100      0 ?

We do indeed have the two routes installed.

So how could the customer setup their routers to learn these prefixes automatically and use them actively?
Well, one solution would be stateless autoconfiguration, which i have opted to use here along with setting the default route doing this, on R5:

R5#sh run int g1.405
Building configuration...

Current configuration : 96 bytes
!
interface GigabitEthernet1.405
 encapsulation dot1Q 405
 ipv6 address autoconfig default
end

R5#sh ipv6 route | beg a - Ap
       a - Application
ND  ::/0 [2/0]
     via FE80::250:56FF:FEBE:49F3, GigabitEthernet1.405
NDp 2001:1111:2222:1::/64 [2/0]
     via GigabitEthernet1.405, directly connected
L   2001:1111:2222:1:250:56FF:FEBE:3DFB/128 [0/0]
     via GigabitEthernet1.405, receive
L   FF00::/8 [0/0]
     via Null0, receive

and R6:

R6#sh run int g1.406
Building configuration...

Current configuration : 96 bytes
!
interface GigabitEthernet1.406
 encapsulation dot1Q 406
 ipv6 address autoconfig default
end

R6#sh ipv6 route | beg a - App
       a - Application
ND  ::/0 [2/0]
     via FE80::250:56FF:FEBE:49F3, GigabitEthernet1.406
NDp 2001:1111:2222:2::/64 [2/0]
     via GigabitEthernet1.406, directly connected
L   2001:1111:2222:2:250:56FF:FEBE:D054/128 [0/0]
     via GigabitEthernet1.406, receive
L   FF00::/8 [0/0]
     via Null0, receive

So now we have the SP core in place, we have the internal customer in place. All thats really required now is for some sort of routing to take place on the CE routers toward the SP. I have chosen the simplest solution, a static default route:

R4#sh run | incl ipv6 route
ipv6 route ::/0 2001:1111:101::2

and on R7:

R7#sh run | incl ipv6 route
ipv6 route ::/0 2001:1111:102::2

Finally its time to test all this stuff out in the data plane.

Lets ping from R3 to R5 and R6:

R3#ping 2001:1111:2222:1:250:56FF:FEBE:3DFB
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:1:250:56FF:FEBE:3DFB, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/12/20 ms
R3#ping 2001:1111:2222:2:250:56FF:FEBE:D054
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:2:250:56FF:FEBE:D054, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/7/17 ms

And also to R7:

R3#ping 2001:1111:2222:101::7
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:101::7, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/8/18 ms

Excellent. Everything works.

Lets summarize what we have done.

1) We created a local IPv6 pool on the PE router.
2) We created a DHCPv6 server utilizing this local pool as a prefix delegation.
3) We enabled the DHCPv6 server on the customer facing interfaces.
4) We enabled the DHCPv6 PD on the CE routers (R4 and R7) and used a local label as an identifier.
5) We enabled IPv6 addresses using PD on the local interfaces toward R5, R6 on Customer A and on R7 on Customer B.
6) We used stateless autoconfiguration internal to the customers to further propagate the IPv6 prefixes.
7) We created static routing on the CE routers toward the SP.
8) We redistributed statics into BGP on the PE router.
9) We verified that IPv6 prefixes were being delegated through DHCPv6.
10) And finally we verified that everything was working in the data plane.

I hope this has covered a pretty niche topic of IPv6 and it has been useful to you.

Take care!