Category Archives: CCIE - Page 2

Trying out IPv6 Prefix Delegation

In this post i will show how and why to use a feature called IPv6 Prefix Delegation (PD).

IPv6 prefix delegation is a feature that provides the capability to delegate or hand out IPv6 prefixes to other routers without the need to hardcode these prefixes into the routers.

Why would you want to do this? – Well, for one is the administration overhead associated with manual configuration. If the end-customer only cares about the amount of prefixes he or she receives, then it might as well be handed out automatically from a preconfigure pool. Just like DHCP works today on end-user systems.

On top of that, by configuring a redistribution into BGP just once, you will automatically have reachability to the prefixes that has been handed out, from the rest of your SP network.

So how do you go about configuring this? – Well, lets take a look at the topology we’ll be using to demonstrate IPv6 Prefix Delegation.

PD-Post-Topology

First off, we have the SP core network which consists of R1, R2 and R3. They are running in AS 64512 with R1 being a BGP route-reflector for the IPv6 unicast address-family. As an IGP we are running OSPFv3 to provide reachability within the core. No IPv4 is configured on any device.

The SP has been allocated a /32 IPv6 prefix which is 2001:1111::/32, from which it will “carve” out IPv6 prefixes to both its internal network as well as customer networks.

We are using /125 for the links between the core routers, just to make it simple when looking at the routing tables and the topology.

R2 is really where all the magic is taking place. R2 is a PE for two customers, Customer A and Customer B. Customer A is being reached through Gigabit2 and Customer B through Gigabit3. The customer’s respective CE routers are R4 and R7.

There is a link-net between R2 and R4 as well as R2 and R7. These are respectively 2001:1111:101::/64 and 2001:1111:102::/64.

So Lab-ISP has decided to use a /48 network from which to hand out prefixes to its customers. This /48 is 2001:1111:2222::/48. Lab-ISP also decided to hand out /56 addresses which will give the customers 8 bits (from 56 to 64) to use for subnetting. This is a typical deployment.

Also, since we are using a /48 as the block to “carve” out from, this gives us 8 bits (from 48 to 56) of assignable subnets, which ofcourse equals to 256 /56 prefixes we can hand out.

All of this can be a bit confusing, so lets look at it from a different perspective.

We start out with 2001:1111:2222::/48. We then want to look at how the first /56 looks like:

The 2001:1111:2222:0000::/56 is
2001:1111:2222:0000::
until
2001:1111:2222:00FF::

That last byte (remember this is all in hex) is what gives the customer 256 subnets to play around with.

The next /56 is:
2001:1111:2222:0100::/56

2001:1111:2222:0100::
until
2001:1111:2222:01FF::

We can do this all in all 256 times as mentioned earlier.

So in summary, with two customers, each receiving a /56 prefix, we would expect to see the bindings show up on R2 as:

2001:1111:2222::/56
2001:1111:2222:100::/56

So with all this theory in place, lets take a look at the configuration that makes all this work out.

First off we start out with creating a local IPv6 pool on R2:

ipv6 local pool IPv6-Local-Pool 2001:1111:2222::/48 56

This is in accordance to the requirements we have stated earlier.

Next up, we tie this local pool into a global IPv6 pool used specifically for Prefix Delegation:

ipv6 dhcp pool PD-DHCP-POOL
 prefix-delegation pool IPv6-Local-Pool

Finally we attach the IPv6 DHCP pool to the interfaces of Customer A and Customer B:

R2#sh run int g2
Building configuration...

Current configuration : 132 bytes
!
interface GigabitEthernet2
 no ip address
 negotiation auto
 ipv6 address 2001:1111:101::2/64
 ipv6 dhcp server PD-DHCP-POOL
end

R2#sh run int g3
Building configuration...

Current configuration : 132 bytes
!
interface GigabitEthernet3
 no ip address
 negotiation auto
 ipv6 address 2001:1111:102::2/64
 ipv6 dhcp server PD-DHCP-POOL
end

Thats pretty much all thats required from the SP point of view in order to hand out the prefixes.

Now, lets take a look at whats required on the CE routers.

Starting off with R4’s interface to the SP:

R4#sh run int g2
Building configuration...

Current configuration : 156 bytes
!
interface GigabitEthernet2
 no ip address
 negotiation auto
 ipv6 address 2001:1111:101::3/64
 ipv6 address autoconfig
 ipv6 dhcp client pd LOCAL-CE
end

Note that the “LOCAL-CE” is a local label we will use for the next step. It can be anything you desire.

Only when the “inside” interfaces requests an IPv6 address will a request be sent to the SP for them to hand something out. This is done on R4’s g1.405 and g1.406 interfaces:

R4#sh run int g1.405
Building configuration...

Current configuration : 126 bytes
!
interface GigabitEthernet1.405
 encapsulation dot1Q 405
 ipv6 address LOCAL-CE ::1:0:0:0:1/64
 ipv6 address autoconfig
end

R4#sh run int g1.406
Building configuration...

Current configuration : 126 bytes
!
interface GigabitEthernet1.406
 encapsulation dot1Q 406
 ipv6 address LOCAL-CE ::2:0:0:0:1/64
 ipv6 address autoconfig
end

Here we reference the previous local label “LOCAL-CE”. Most interesting is the fact that we are now subnetting the /56 prefix we have received by doing the “::1:0:0:0:1/64” and “::2:0:0:0:1/64” respectively.

What this does is that it appends the address to whats being given out. To repeat, for Customer A, this is 2001:1111:2222::/56 which will then be a final address of: 2001:1111:2222:1:0:0:0:1/64 for interface g1.405 and 2001:1111:2222:2:0:0:0:1/64 for g1.406.

Lets turn our attention to Customer B on R7.

Same thing has been configured, just using a different “label” for the assigned pool to show that its arbitrary:

R7#sh run int g3
Building configuration...

Current configuration : 155 bytes
!
interface GigabitEthernet3
 no ip address
 negotiation auto
 ipv6 address 2001:1111:102::7/64
 ipv6 address autoconfig
 ipv6 dhcp client pd CE-POOL
end

And the inside interface g1.100:

R7#sh run int g1.100
Building configuration...

Current configuration : 100 bytes
!
interface GigabitEthernet1.100
 encapsulation dot1Q 100
 ipv6 address CE-POOL ::1:0:0:0:7/64
end

Again, we are subnetting the received /56 into a /64 and applying it on the inside interface.

Going back to the SP point of view, lets verify that we are handing out some prefixes:

R2#sh ipv6 local pool
Pool                  Prefix                                       Free  In use
IPv6-Local-Pool       2001:1111:2222::/48                            254      2

We can see that our local pool has handed out 2 prefixes and if we dig further down into the bindings:

R2#sh ipv6 dhcp binding
Client: FE80::250:56FF:FEBE:93CC
  DUID: 00030001001EF6767600
  Username : unassigned
  VRF : default
  Interface : GigabitEthernet3
  IA PD: IA ID 0x00080001, T1 302400, T2 483840
    Prefix: 2001:1111:2222:100::/56
            preferred lifetime 604800, valid lifetime 2592000
            expires at Oct 16 2014 03:11 PM (2416581 seconds)
Client: FE80::250:56FF:FEBE:4754
  DUID: 00030001001EE5DF8700
  Username : unassigned
  VRF : default
  Interface : GigabitEthernet2
  IA PD: IA ID 0x00070001, T1 302400, T2 483840
    Prefix: 2001:1111:2222::/56
            preferred lifetime 604800, valid lifetime 2592000
            expires at Oct 16 2014 03:11 PM (2416575 seconds)

We see that we do indeed have some bindings taking place. Whats of more interest though, is the fact that static routes have been created:

R2#sh ipv6 route static | beg a - Ap
       a - Application
S   2001:1111:2222::/56 [1/0]
     via FE80::250:56FF:FEBE:4754, GigabitEthernet2
S   2001:1111:2222:100::/56 [1/0]
     via FE80::250:56FF:FEBE:93CC, GigabitEthernet3

So two static routes that points to the CE routers. This makes it extremely simple to propagate further into the SP core:

R2#sh run | sec router bgp
router bgp 64512
 bgp router-id 2.2.2.2
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 2001:1111::12:1 remote-as 64512
 !
 address-family ipv4
 exit-address-family
 !
 address-family ipv6
  redistribute static
  neighbor 2001:1111::12:1 activate
 exit-address-family

Ofcourse some sort of filtering should be used instead of just redistributing every static route on the PE, but you get the point. So lets check it out on R3 for example:

R3#sh bgp ipv6 uni | beg Network
     Network          Next Hop            Metric LocPrf Weight Path
 *>i 2001:1111:2222::/56
                       2001:1111::12:2          0    100      0 ?
 *>i 2001:1111:2222:100::/56
                       2001:1111::12:2          0    100      0 ?

We do indeed have the two routes installed.

So how could the customer setup their routers to learn these prefixes automatically and use them actively?
Well, one solution would be stateless autoconfiguration, which i have opted to use here along with setting the default route doing this, on R5:

R5#sh run int g1.405
Building configuration...

Current configuration : 96 bytes
!
interface GigabitEthernet1.405
 encapsulation dot1Q 405
 ipv6 address autoconfig default
end

R5#sh ipv6 route | beg a - Ap
       a - Application
ND  ::/0 [2/0]
     via FE80::250:56FF:FEBE:49F3, GigabitEthernet1.405
NDp 2001:1111:2222:1::/64 [2/0]
     via GigabitEthernet1.405, directly connected
L   2001:1111:2222:1:250:56FF:FEBE:3DFB/128 [0/0]
     via GigabitEthernet1.405, receive
L   FF00::/8 [0/0]
     via Null0, receive

and R6:

R6#sh run int g1.406
Building configuration...

Current configuration : 96 bytes
!
interface GigabitEthernet1.406
 encapsulation dot1Q 406
 ipv6 address autoconfig default
end

R6#sh ipv6 route | beg a - App
       a - Application
ND  ::/0 [2/0]
     via FE80::250:56FF:FEBE:49F3, GigabitEthernet1.406
NDp 2001:1111:2222:2::/64 [2/0]
     via GigabitEthernet1.406, directly connected
L   2001:1111:2222:2:250:56FF:FEBE:D054/128 [0/0]
     via GigabitEthernet1.406, receive
L   FF00::/8 [0/0]
     via Null0, receive

So now we have the SP core in place, we have the internal customer in place. All thats really required now is for some sort of routing to take place on the CE routers toward the SP. I have chosen the simplest solution, a static default route:

R4#sh run | incl ipv6 route
ipv6 route ::/0 2001:1111:101::2

and on R7:

R7#sh run | incl ipv6 route
ipv6 route ::/0 2001:1111:102::2

Finally its time to test all this stuff out in the data plane.

Lets ping from R3 to R5 and R6:

R3#ping 2001:1111:2222:1:250:56FF:FEBE:3DFB
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:1:250:56FF:FEBE:3DFB, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/12/20 ms
R3#ping 2001:1111:2222:2:250:56FF:FEBE:D054
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:2:250:56FF:FEBE:D054, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/7/17 ms

And also to R7:

R3#ping 2001:1111:2222:101::7
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2001:1111:2222:101::7, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/8/18 ms

Excellent. Everything works.

Lets summarize what we have done.

1) We created a local IPv6 pool on the PE router.
2) We created a DHCPv6 server utilizing this local pool as a prefix delegation.
3) We enabled the DHCPv6 server on the customer facing interfaces.
4) We enabled the DHCPv6 PD on the CE routers (R4 and R7) and used a local label as an identifier.
5) We enabled IPv6 addresses using PD on the local interfaces toward R5, R6 on Customer A and on R7 on Customer B.
6) We used stateless autoconfiguration internal to the customers to further propagate the IPv6 prefixes.
7) We created static routing on the CE routers toward the SP.
8) We redistributed statics into BGP on the PE router.
9) We verified that IPv6 prefixes were being delegated through DHCPv6.
10) And finally we verified that everything was working in the data plane.

I hope this has covered a pretty niche topic of IPv6 and it has been useful to you.

Take care!

VRF based path selection

In this post I will be showing you how its possible to use different paths between your PE routers on a per VRF basis.

This is very useful if you have customers you want to “steer” away from your normal traffic flow between PE routers.
For example, this could be due to certain SLA’s.

I will be using the following topology to demonstrate how this can be done:

Topology

A short walkthrough of the topology is in order.

In the service provider core we have 4 routers. R3, XRv-1, XRv-2 and R4. R3 and R4 are IOS-XE based routers and XRv-1 and XRv-2 are as the name implies, IOS-XR routers. There is no significance attached to the fact that im running two XR routers. Its simply how I could build the required topology.

The service provider is running OSPF as the IGP, with R3 and R4 being the PE routers for an MPLS L3 VPN service. On top of that, LDP is being used to build the required LSP’s. The IGP has been modified to prefer the northbound path (R3 -> XRv-1 -> R4) by increasing the cost of the R3, XRv-2 and R4 to 100.

So by default, traffic between R3 and R4 will flow northbound.

We can easily verify this:

R3#traceroute 4.4.4.4
Type escape sequence to abort.
Tracing the route to 4.4.4.4
VRF info: (vrf in name/id, vrf out name/id)
  1 10.3.10.10 [MPLS: Label 16005 Exp 0] 16 msec 1 msec 1 msec
  2 10.4.10.4 1 msec *  5 msec

And the reverse path is the same:

R4#traceroute 3.3.3.3
Type escape sequence to abort.
Tracing the route to 3.3.3.3
VRF info: (vrf in name/id, vrf out name/id)
  1 10.4.10.10 [MPLS: Label 16000 Exp 0] 3 msec 2 msec 0 msec
  2 10.3.10.3 1 msec *  5 msec

Besides that traffic flow the desired way, we can see we are using label switching between the loopbacks. Exactly what we want in this type of setup.

On the customer side, we have 2 customers, Customer A and Customer B. Each of them has 2 sites, one behind R3 and one behind R4. Pretty simple. They are all running EIGRP between the CE’s and the PE’s.

Beyond this we have MPLS Traffic Engineering running in the service core as well. Specifically we are running a tunnel going from R3’s loopback200 (33.33.33.33/32) towards R4’s loopback200 (44.44.44.44/32). This has been accomplished by configuring an explicit path on both R3 and R4.

Lets verify the tunnel configuration on both:

On R3:

R3#sh ip expl
PATH NEW-R3-TO-R4 (strict source route, path complete, generation 8)
    1: next-address 10.3.20.20
    2: next-address 10.4.20.4
R3#sh run int tunnel10
Building configuration...

Current configuration : 180 bytes
!
interface Tunnel10
 ip unnumbered Loopback200
 tunnel mode mpls traffic-eng
 tunnel destination 10.4.20.4
 tunnel mpls traffic-eng path-option 10 explicit name NEW-R3-TO-R4
end

And on R4:

R4#sh ip expl
PATH NEW-R4-TO-R3 (strict source route, path complete, generation 4)
    1: next-address 10.4.20.20
    2: next-address 10.3.20.3
R4#sh run int tun10
Building configuration...

Current configuration : 180 bytes
!
interface Tunnel10
 ip unnumbered Loopback200
 tunnel mode mpls traffic-eng
 tunnel destination 10.3.20.3
 tunnel mpls traffic-eng path-option 10 explicit name NEW-R4-TO-R3
end

On top of that we have configured a static route on both R3 and R4, to steer traffic for each others loopback200’s down the tunnel:

R3#sh run | incl ip route
ip route 44.44.44.44 255.255.255.255 Tunnel10

R4#sh run | incl ip route
ip route 33.33.33.33 255.255.255.255 Tunnel10

Resulting in the following RIB’s:

R3#sh ip route 44.44.44.44
Routing entry for 44.44.44.44/32
  Known via "static", distance 1, metric 0 (connected)
  Routing Descriptor Blocks:
  * directly connected, via Tunnel10
      Route metric is 0, traffic share count is 1
	  
R4#sh ip route 33.33.33.33
Routing entry for 33.33.33.33/32
  Known via "static", distance 1, metric 0 (connected)
  Routing Descriptor Blocks:
  * directly connected, via Tunnel10
      Route metric is 0, traffic share count is 1

And to test out that we are actually using the southbound path (R3 -> XRv-2 -> R4), lets traceroute between the loopbacks (loopback200):

on R3:

R3#traceroute 44.44.44.44 so loopback200
Type escape sequence to abort.
Tracing the route to 44.44.44.44
VRF info: (vrf in name/id, vrf out name/id)
  1 10.3.20.20 [MPLS: Label 16007 Exp 0] 4 msec 2 msec 1 msec
  2 10.4.20.4 1 msec *  3 msec

and on R4:

R4#traceroute 33.33.33.33 so loopback200
Type escape sequence to abort.
Tracing the route to 33.33.33.33
VRF info: (vrf in name/id, vrf out name/id)
  1 10.4.20.20 [MPLS: Label 16008 Exp 0] 4 msec 1 msec 1 msec
  2 10.3.20.3 1 msec *  3 msec

This verifies that we have our two unidirectional tunnels and that communication between the loopback200 interfaces flows through the southbound path using our TE tunnels.

So lets take a look at the very simple BGP PE configuration on both R3 and R4:

R3:

router bgp 100
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 4.4.4.4 remote-as 100
 neighbor 4.4.4.4 update-source Loopback100
 !
 address-family ipv4
 exit-address-family
 !
 address-family vpnv4
  neighbor 4.4.4.4 activate
  neighbor 4.4.4.4 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf A
  redistribute eigrp 100
 exit-address-family
 !
 address-family ipv4 vrf B
  redistribute eigrp 100
 exit-address-family

and R4:

router bgp 100
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 3.3.3.3 remote-as 100
 neighbor 3.3.3.3 update-source Loopback100
 !
 address-family ipv4
 exit-address-family
 !
 address-family vpnv4
  neighbor 3.3.3.3 activate
  neighbor 3.3.3.3 send-community extended
 exit-address-family
 !
 address-family ipv4 vrf A
  redistribute eigrp 100
 exit-address-family
 !
 address-family ipv4 vrf B
  redistribute eigrp 100
 exit-address-family

From this output, we can see that we are using the loopback100 interfaces for the BGP peering. As routing updates comes in from one PE, the next-hop will be set to the remote PE’s loopback100 interface. This will then cause the transport-label to be one going to this loopback100 interface.

A traceroute from R1’s loopback0 interface to R5’s loopback0 interface, will show us the path that traffic between each site in VRF A (Customer A) will take:

R1:

R1#traceroute 5.5.5.5 so loo0
Type escape sequence to abort.
Tracing the route to 5.5.5.5
VRF info: (vrf in name/id, vrf out name/id)
  1 10.1.3.3 1 msec 1 msec 0 msec
  2 10.3.10.10 [MPLS: Labels 16005/408 Exp 0] 6 msec 1 msec 10 msec
  3 10.4.5.4 [MPLS: Label 408 Exp 0] 15 msec 22 msec 17 msec
  4 10.4.5.5 18 msec *  4 msec

and lets compare that to what R3 will use as the transport label to reach R4’s loopback100 interface:

 
R3#sh mpls for
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
300        Pop Label  10.10.10.10/32   0             Gi1.310    10.3.10.10
301        Pop Label  10.4.10.0/24     0             Gi1.310    10.3.10.10
302        Pop Label  20.20.20.20/32   0             Gi1.320    10.3.20.20
303        16004      10.4.20.0/24     0             Gi1.310    10.3.10.10
304   [T]  Pop Label  44.44.44.44/32   0             Tu10       point2point
305        16005      4.4.4.4/32       0             Gi1.310    10.3.10.10
310        No Label   1.1.1.1/32[V]    2552          Gi1.13     10.1.3.1
311        No Label   10.1.3.0/24[V]   0             aggregate/A
312        No Label   2.2.2.2/32[V]    2552          Gi1.23     10.2.3.2
313        No Label   10.2.3.0/24[V]   0             aggregate/B

We can see that this matches up being 16005 (going to XRv-1) through the northbound path.

This begs the question, how do we steer our traffic through the southbound path using the loopback200 instead, when the peering is between loopback100’s?

Well, thankfully IOS has it covered. Under the VRF configuration for Customer B (VRF B), we have the option of setting the loopback interface of updates sent to the remote PE:

On R3:

vrf definition B
 rd 100:2
 !
 address-family ipv4
  route-target export 100:2
  route-target import 100:2
  bgp next-hop Loopback200
 exit-address-family

and the same on R4:

 vrf definition B
  rd 100:2
  !
  address-family ipv4
   route-target export 100:2
   route-target import 100:2
   bgp next-hop Loopback200
  exit-address-family

This causes the BGP updates to contain the “correct” next-hop:

R3:

R3#sh bgp vpnv4 uni vrf B | beg Route Dis
Route Distinguisher: 100:2 (default for vrf B)
 *>  2.2.2.2/32       10.2.3.2            130816         32768 ?
 *>i 6.6.6.6/32       44.44.44.44         130816    100      0 ?
 *>  10.2.3.0/24      0.0.0.0                  0         32768 ?
 *>i 10.4.6.0/24      44.44.44.44              0    100      0 ?

44.44.44.44/32 being the loopback200 of R4, and on R4:

R4#sh bgp vpnv4 uni vrf B | beg Route Dis
Route Distinguisher: 100:2 (default for vrf B)
 *>i 2.2.2.2/32       33.33.33.33         130816    100      0 ?
 *>  6.6.6.6/32       10.4.6.6            130816         32768 ?
 *>i 10.2.3.0/24      33.33.33.33              0    100      0 ?
 *>  10.4.6.0/24      0.0.0.0                  0         32768 ?

Lets check out whether this actually works or not:

R2#traceroute 6.6.6.6 so loo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
  1 10.2.3.3 1 msec 1 msec 0 msec
  2 10.3.20.20 [MPLS: Labels 16007/409 Exp 0] 4 msec 1 msec 10 msec
  3 10.4.6.4 [MPLS: Label 409 Exp 0] 15 msec 16 msec 17 msec
  4 10.4.6.6 19 msec *  4 msec

Excellent! – We can see that we are indeed using the southbound path. To make sure we are using the tunnel, note the transport label of 16007, and compare that to:

R3:

R3#sh mpls traffic-eng tun tunnel 10

Name: R3_t10                              (Tunnel10) Destination: 10.4.20.4
  Status:
    Admin: up         Oper: up     Path: valid       Signalling: connected
    path option 10, type explicit NEW-R3-TO-R4 (Basis for Setup, path weight 200)

  Config Parameters:
    Bandwidth: 0        kbps (Global)  Priority: 7  7   Affinity: 0x0/0xFFFF
    Metric Type: TE (default)
    AutoRoute: disabled LockDown: disabled Loadshare: 0 [0] bw-based
    auto-bw: disabled
  Active Path Option Parameters:
    State: explicit path option 10 is active
    BandwidthOverride: disabled  LockDown: disabled  Verbatim: disabled


  InLabel  :  -
  OutLabel : GigabitEthernet1.320, 16007
  Next Hop : 10.3.20.20

I have deleted alot of non-relevant output, but pay attention to the Outlabel, which is indeed 16007.

So that was a quick walkthrough of how easy it is to accomplish the stated goal once you know about that nifty IOS command.

I hope its been useful to you.

Take Care!

Using the OSPF Forwarding Address for traffic-steering

In this fairly short post, id like to address a topic that came up on IRC (#cciestudy @ freenode.net). Its about how you select a route thats being redistributed into an OSPF NSSA area and comes into the OSPF backbone area 0.

For my post i will be using the very simple topology below. Nothing else is necessary to illustrate what is going on.

FA-NSSA-Topology

First off, id like to clarify a few things about what takes place when redistributing routes into an NSSA area.

What happens is that you have an external network, 4.4.4.4/32 in our example. This is _not_ part of the current area 1. When this network is being redistributed into area 1, its forwarding address will be set to the highest active interface of the redistributing router in the area (R4 in our case). The highest interface in the area local to the router is Loopback100 with an address of 44.44.44.44/32.

*A reader noted that a loopback address will beat a physical interface even if it has a lower address. This is true and goes for OSPF in general. Thanks!

Lets verify the configuration on R4 and the result of the redistribution to the OSPF database:

R4#sh run | sec router ospf
router ospf 100
router-id 144.144.144.144
log-adjacency-changes
area 1 nssa
redistribute connected subnets
network 10.2.0.0 0.0.255.255 area 1
network 10.3.0.0 0.0.255.255 area 1
network 44.44.44.44 0.0.0.0 area 1

So we are running Area 1 on three interfaces connecting to R2 and R3 along with a loopback100 interface.

And the output of the relevant section of the OSPF database is:

R4#sh ip os data nssa

OSPF Router with ID (144.144.144.144) (Process ID 100)

Type-7 AS External Link States (Area 1)

LS age: 408
Options: (No TOS-capability, Type 7/5 translation, DC)
LS Type: AS External Link
Link State ID: 4.4.4.4 (External Network Number )
Advertising Router: 144.144.144.144
LS Seq Number: 80000001
Checksum: 0x4A49
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
MTID: 0
Metric: 20
Forward Address: 44.44.44.44
External Route Tag: 0

What we are verifying here is the fact that the FA is in fact set according to the forementioned rules, namely 44.44.44.44.

Lets take a look at the OSPF configuration of R2 and R3:

R2#sh run | sec router ospf
router ospf 100
router-id 22.22.22.22
log-adjacency-changes
area 1 nssa
network 10.1.2.0 0.0.0.255 area 0
network 10.2.4.0 0.0.0.255 area 1

And R3:

R3#sh run | sec router ospf
router ospf 100
log-adjacency-changes
area 1 nssa
network 10.1.3.0 0.0.0.255 area 0
network 10.3.4.0 0.0.0.255 area 1

Very straigh forward so far, with the exception to the fact that i have manually set R2’s router-id, to force it to be higher than R3. This is to prove the point below.

Now what we should ideally see, is that the ABR (R2 and R3) with the highest router-id will do the type-7 to type-5 translation and preserve the FA of the type-7. What we would like to see on R1, is a type 5 LSA with a Forwarding Address of 44.44.44.44, with the advertising router be R2 (22.22.22.22). Lets check it out:

R1#sh ip os data ex

OSPF Router with ID (10.1.3.1) (Process ID 100)

Type-5 AS External Link States

Routing Bit Set on this LSA in topology Base with MTID 0
LS age: 630
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 4.4.4.4 (External Network Number )
Advertising Router: 22.22.22.22
LS Seq Number: 80000001
Checksum: 0x394E
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
MTID: 0
Metric: 20
Forward Address: 44.44.44.44
External Route Tag: 0

R1#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
Known via "ospf 100", distance 110, metric 20, type extern 2, forward metric 3
Last update from 10.1.3.3 on FastEthernet1/1, 00:11:03 ago
Routing Descriptor Blocks:
10.1.3.3, from 22.22.22.22, 00:11:03 ago, via FastEthernet1/1
Route metric is 20, traffic share count is 1
* 10.1.2.2, from 22.22.22.22, 00:11:03 ago, via FastEthernet1/0
Route metric is 20, traffic share count is 1

Very good, we are in fact seeing this LSA with the information we expected. We can also see something you might not expect, namely the fact that we have two paths installed in the RIB for 4.4.4.4/32. Why is that?

Well, what R1 really cares about is “how” it can get to the Forwarding Address of the route and in this case, it can get to 44.44.44.44/32 through 2 paths, R2 and R3.

Lets check out what happens if we block 44.44.44.44/32 going from Area 1 to Area 0 through R2.

R2#sh run | incl prefix-list
ip prefix-list BLOCK-R4-LOOPBACK seq 5 deny 44.44.44.44/32
ip prefix-list BLOCK-R4-LOOPBACK seq 10 permit 0.0.0.0/0 le 32

R2#sh run | sec router ospf
router ospf 100
router-id 22.22.22.22
log-adjacency-changes
area 1 nssa
area 1 filter-list prefix BLOCK-R4-LOOPBACK out
network 10.1.2.0 0.0.0.255 area 0
network 10.2.4.0 0.0.0.255 area 1

Lets see what this does to the RIB of R1:

R1#sh ip route | beg Gateway
Gateway of last resort is not set

4.0.0.0/32 is subnetted, 1 subnets
O E2 4.4.4.4 [110/20] via 10.1.3.3, 00:16:43, FastEthernet1/1
10.0.0.0/8 is variably subnetted, 6 subnets, 2 masks
C 10.1.2.0/24 is directly connected, FastEthernet1/0
L 10.1.2.1/32 is directly connected, FastEthernet1/0
C 10.1.3.0/24 is directly connected, FastEthernet1/1
L 10.1.3.1/32 is directly connected, FastEthernet1/1
O IA 10.2.4.0/24 [110/2] via 10.1.2.2, 00:16:47, FastEthernet1/0
O IA 10.3.4.0/24 [110/2] via 10.1.3.3, 00:23:41, FastEthernet1/1
44.0.0.0/32 is subnetted, 1 subnets
O IA 44.44.44.44 [110/3] via 10.1.3.3, 00:16:48, FastEthernet1/1

and the LSA is still the same as before:

R1#sh ip os data ex

OSPF Router with ID (10.1.3.1) (Process ID 100)

Type-5 AS External Link States

Routing Bit Set on this LSA in topology Base with MTID 0
LS age: 1027
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 4.4.4.4 (External Network Number )
Advertising Router: 22.22.22.22
LS Seq Number: 80000001
Checksum: 0x394E
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
MTID: 0
Metric: 20
Forward Address: 44.44.44.44
External Route Tag: 0

So what this tells us, is that if the Forwarding Address is different than 0.0.0.0 (which we’ll cover in a minute) and you dont have reachability to whatever its set to, you cannot install this in the RIB.

In our case we still have one valid path through R3, so its still in the RIB, but not with load-balancing.

So to summarize what we have covered so far:
– Even though only 1 ABR creates the new type-5 (type-7 to type-5 translation), you can have load-balacing occuring.
– If you dont have a valid path to the Forwarding Address, you cannot install it in the RIB.

Lets revert our configuration on R2:

R2#sh run | sec router ospf
router ospf 100
router-id 22.22.22.22
log-adjacency-changes
area 1 nssa
network 10.1.2.0 0.0.0.255 area 0
network 10.2.4.0 0.0.0.255 area 1

Now lets take a look at FA-Suppression!

What FA-Suppression does, is that instead of preserving the FA according to the previously mentioned rules, it sets the Forwarding Address to 0.0.0.0, indicating that the router originating the Type-5 should be used as the exit point.

We’ve already established that R2 is the router performing the Type-7 to Type-5 translation, so lets do the following configuration on R2:

R2(config-router)#area 1 nssa translate type7 suppress-fa

What does this do to our OSPF database on R1, specifically the Type-5 LSA:

R1#sh ip os data ext

OSPF Router with ID (10.1.3.1) (Process ID 100)

Type-5 AS External Link States

Routing Bit Set on this LSA in topology Base with MTID 0
LS age: 33
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 4.4.4.4 (External Network Number )
Advertising Router: 22.22.22.22
LS Seq Number: 80000002
Checksum: 0x96A0
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
MTID: 0
Metric: 20
Forward Address: 0.0.0.0
External Route Tag: 0

Indeed the Forwarding Address has been set to 0.0.0.0, indicating that the Advertising Router (22.22.22.22) should be used as the exit point. This also has the effect of removing our load-balancing from occuring:

R1#sh ip route 4.4.4.4
Routing entry for 4.4.4.4/32
Known via "ospf 100", distance 110, metric 20, type extern 2, forward metric 1
Last update from 10.1.2.2 on FastEthernet1/0, 00:03:48 ago
Routing Descriptor Blocks:
* 10.1.2.2, from 22.22.22.22, 00:03:48 ago, via FastEthernet1/0
Route metric is 20, traffic share count is 1

So depending on how you want to “steer” your traffic, you might want to consider whether you allow the Forwarding Address through your topology and if you want to use FA suppression or not.

I hope its been useful to you!

Take care!

Passed the CCIE SP Lab exam.

CCIE Service ProviderWell, a short update. I managed to pass the CCIE Service Provider lab exam on March 14th.

I am quite exhausted from the experience, but very happy ūüôā

Short update

Its been a long time since my last update. I apologise for this. It wasnt my intention, it just sort of happened.

In the meantime I have tried the CCIE SP lab and didnt pass it, so I am still studying for my next attempt which is comming up shortly.
Until then I have booked quite a number of rack hours. Hopefully I will learn from some of the mistakes I have identified.

Just last week Cisco announced the IOS XRv image that allows you to run a virtual instance of IOS XR. This is great news for the community at large as it provides the ability to learn about XR without having to spend alot of money on rack rentals or even buying platforms that run XR.

Unfortunally, there is a bug in the download system, which Cisco is trying to correct. It disallows the download for people with active partner status. This includes me. So we are to wait 3 days at the time of writing until it gets sorted out.

I suggest you take a look at FryGuy’s blog about the release of IOS XRv.
The link can be found here:
http://www.fryguy.net/2014/02/08/cisco-ios-xrv-v-as-in-virtual/

Take care!

ISIS csnp-interval

The CSNP on multiaccess networks

The CSNP (Complete Sequence Number PDU) on multi-access networks is being sent out on behalf of the DIS (Designated Intermediate System), which acts as the pseudonode representing the multi-access network. Its being used as ISIS’s way of making sure everybody on the multi-access network is up to date. If thats not the case, the node which is missing some routing information can use PSNP (Partial Sequence Number PDU)’s to request the missing information from the DIS.

The csnp-interval is simply the timer that controls how often the DIS sends out this CSNP. The default on IOS (and XR) is every 10 seconds.

Its important to know that a separate timer is kept for both level‚Äď1 and level‚Äď2 DIS.

For this example i will be using the topology listed in figure 1:

Topology

Take note of the fact that i have manually set the Mac address on the routers to make it more obvious which router is the DIS from the point of view of debugs.

Since everybody has the same priority (default 64), the highest SNPA (SubNet Point of Attachment), which translates to the Mac address, will be used as the tiebreaker. Highest one wins. In our case this will be R3.

The output below highlights this fact:

R1#sh isis nei

System Id      Type Interface   IP Address      State Holdtime Circuit Id
R2             L2   Fa1/0       100.100.100.2   UP    28       R3.01
R3             L2   Fa1/0       100.100.100.3   UP    7        R3.01

Now to prove the CSNP timer, lets look at what our debugs are telling us:

R1#
*Jun 24 16:58:10.267: ISIS-Snp: Rec L2 CSNP from 0000.0000.0003 (FastEthernet1/0)
*Jun 24 16:58:10.267: ISIS-SNP: CSNP range 0000.0000.0000.00-00 to FFFF.FFFF.FFFF.FF-FF
*Jun 24 16:58:10.267: ISIS-SNP: Same entry 0000.0000.0001.00-00, seq 6
*Jun 24 16:58:10.267: ISIS-SNP: Same entry 0000.0000.0002.00-00, seq 4
*Jun 24 16:58:10.267: ISIS-SNP: Same entry 0000.0000.0003.00-00, seq 3
*Jun 24 16:58:10.267: ISIS-SNP: Same entry 0000.0000.0003.01-00, seq 2
R1#
*Jun 24 16:58:19.467: ISIS-Snp: Rec L2 CSNP from 0000.0000.0003 (FastEthernet1/0)
*Jun 24 16:58:19.471: ISIS-SNP: CSNP range 0000.0000.0000.00-00 to FFFF.FFFF.FFFF.FF-FF
*Jun 24 16:58:19.471: ISIS-SNP: Same entry 0000.0000.0001.00-00, seq 6
*Jun 24 16:58:19.471: ISIS-SNP: Same entry 0000.0000.0002.00-00, seq 4
*Jun 24 16:58:19.471: ISIS-SNP: Same entry 0000.0000.0003.00-00, seq 3
*Jun 24 16:58:19.475: ISIS-SNP: Same entry 0000.0000.0003.01-00, seq 2
R1#
*Jun 24 16:58:27.639: ISIS-Snp: Rec L2 CSNP from 0000.0000.0003 (FastEthernet1/0)
*Jun 24 16:58:27.643: ISIS-SNP: CSNP range 0000.0000.0000.00-00 to FFFF.FFFF.FFFF.FF-FF
*Jun 24 16:58:27.643: ISIS-SNP: Same entry 0000.0000.0001.00-00, seq 6
*Jun 24 16:58:27.643: ISIS-SNP: Same entry 0000.0000.0002.00-00, seq 4
*Jun 24 16:58:27.643: ISIS-SNP: Same entry 0000.0000.0003.00-00, seq 3
*Jun 24 16:58:27.643: ISIS-SNP: Same entry 0000.0000.0003.01-00, seq 2

Roughly every 10 seconds R1 receives a L2 frame containing the CSNP from R3 (0000.0000.0003). So at least the theory is spot on. Now lets modify the timer to see if it kicks in:

R3(config)#
 R3(config)#int f1/0
 R3(config-if)#isis csnp-interval 20

So now on R1, we should see the CSNP arrive every 20 seconds instead:

R1#
*Jun 24 16:59:32.883: ISIS-Snp: Rec L2 CSNP from 0000.0000.0003 (FastEthernet1/0)
*Jun 24 16:59:32.883: ISIS-SNP: CSNP range 0000.0000.0000.00-00 to FFFF.FFFF.FFFF.FF-FF
*Jun 24 16:59:32.883: ISIS-SNP: Same entry 0000.0000.0001.00-00, seq 6
*Jun 24 16:59:32.883: ISIS-SNP: Same entry 0000.0000.0002.00-00, seq 4
*Jun 24 16:59:32.887: ISIS-SNP: Same entry 0000.0000.0003.00-00, seq 3
*Jun 24 16:59:32.887: ISIS-SNP: Same entry 0000.0000.0003.01-00, seq 2
R1#
*Jun 24 16:59:49.679: ISIS-Snp: Rec L2 CSNP from 0000.0000.0003 (FastEthernet1/0)
*Jun 24 16:59:49.679: ISIS-SNP: CSNP range 0000.0000.0000.00-00 to FFFF.FFFF.FFFF.FF-FF
*Jun 24 16:59:49.679: ISIS-SNP: Same entry 0000.0000.0001.00-00, seq 6
*Jun 24 16:59:49.679: ISIS-SNP: Same entry 0000.0000.0002.00-00, seq 4
*Jun 24 16:59:49.679: ISIS-SNP: Same entry 0000.0000.0003.00-00, seq 3
*Jun 24 16:59:49.679: ISIS-SNP: Same entry 0000.0000.0003.01-00, seq 2
R1#
*Jun 24 17:00:07.479: ISIS-Snp: Rec L2 CSNP from 0000.0000.0003 (FastEthernet1/0)
*Jun 24 17:00:07.483: ISIS-SNP: CSNP range 0000.0000.0000.00-00 to FFFF.FFFF.FFFF.FF-FF
*Jun 24 17:00:07.483: ISIS-SNP: Same entry 0000.0000.0001.00-00, seq 6
*Jun 24 17:00:07.483: ISIS-SNP: Same entry 0000.0000.0002.00-00, seq 4
*Jun 24 17:00:07.483: ISIS-SNP: Same entry 0000.0000.0003.00-00, seq 3
*Jun 24 17:00:07.487: ISIS-SNP: Same entry 0000.0000.0003.01-00, seq 2

And lo and behold, its working!

Conclusion

With this command you have the ability to modify how often a DIS sends out the required CSNP (Complete Sequence Number PDU). Unless you have a certain requirement that requires you to change this timer, its default of 10 seconds should be able to scale to very large multi-access networks.

I hope the explanation of this timer has been useful to you.

isis retransmit-interval Vs. isis retransmit-throttle-interval

In this short post i want to try and shed some light on a couple of ISIS timers that had me confused at first. I think i got them down now, but please let me know if i have misunderstood them.

The timers in question is “isis retransmit-interval <seconds>” and “isis retransmit-throttle-interval <mseconds>“.

Both of these commands are only relevant on point-to-point links as no concept of a DIS is present here.

Lets start out with the simple topology of two routers in figure 1.

Figure 1

Figure 1

When R1 has some LSPs to send to R2 it sends them with a default interval of 33 ms (which is the “isis lsp-interval“).

This is shown in figure 2.

Figure 2

Figure 2

R1 expects an acknowledgement within the “isis retransmission-interval“, which is 5 seconds by default. The retransmission-interval specifies the interval between retransmissions of the same LSP.

In our example the 5 seconds pass and so R1 must retransmit the LSP’s to R2. R1 is now in its retransmission-window.

Now here’s where the second timer comes into play. Instead of sending the LSP’s which needs to be retransmitted with a 33ms delay between each of them, the “retransmit-throttle-interval” goes into effect and increase the time between the LSP’s from 33ms to 100ms.

This is shown in figure 3.

Figure 3

Figure 3

All of this is done in order to help R2 from staying over burdened. If the same LSP is not acknowledged within 5 seconds, the same LSP is retransmitted again per the retransmission-interval timer.

Configuration wise, it is very simple to configure. On IOS:

Current configuration : 179 bytes
!
interface FastEthernet1/0
ip address 10.1.2.1 255.255.255.0
ip router isis 1
speed auto
duplex auto
isis retransmit-throttle-interval 100
isis retransmit-interval 10
end

And on IOS-XR:

RP/0/0/CPU0:XR1(config-isis-if)#show config
Thu Jun 13 22:49:41.657 UTC
Building configuration...
!! IOS XR Configuration 3.9.1
router isis 1
interface POS0/6/0/0
retransmit-interval 10
retransmit-throttle-interval 100
address-family ipv4 unicast
!
!
!
end

I hope that helped clear up some confusion on these two timers. And again, if i have misunderstood anything, please let me know. Thanks!

References:
Cisco Documentation: http://www.cisco.com/en/US/docs/ios-xml/ios/iproute_isis/command/irs-a1.html#wp1754145253

The complete IS-IS Routing Protocol: Amazon.co.uk

Fixing multicast RPF failure with BGP

In this post i would like to explain how you can fix a multicast RPF failure using BGP.

If you take a look at the topology in figure 1, we have a network running EIGRP as the IGP
and where R1 advertises its loopback 0 (1.1.1.1/32). R4 also has a loopback 0 with the 4.4.4.4/32 address.
EIGRP adjacencies are running between R1 and R2, R1 and R3, R2 and R3 and finally R3 and R4.
Basically on all links in the topology.

Figure 1

Everything is working fine and we can verify that we have reachability through the network using icmp echo between R1 and R4:

R1#ping 4.4.4.4 so loo0

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 4.4.4.4, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/59/96 ms

To create the multicast network, we enable multicast routing on all the routers and enable
pim sparse-mode on all links except the R1 to R3 link! (remember to include the loopbacks).

Rx(config-if)#ip pim sparse-mode

To simulate traffic, R1 will be the source and R4’s loopback will be the receiver, so on R4, lets join
239.1.1.1:

R4(config-if)# ip igmp join-group 239.1.1.1

What we end up with is a PIM topology as in figure 2.

Figure 2

Figure 2

Now what we want to do is make R1 our RP using BSR. What will happen is that R3 will receive a BSR
message on its f2/0 interface and determine that its not the RPF interface it should be getting this message on.
That would instead be f1/0 as per the IGP.

Using “debug ip pim bsr” a message will appear on the console:

*Jun 12 11:07:20.063: PIM-BSR(0): bootstrap (1.1.1.1) on non-RPF path FastEthernet2/0 or from non-RPF neighbor 0.0.0.0 discarded

This basically renders our multicast network incomplete as R3 and R4 wont have any RP.

So lets fix it!

On R2 and R3 we enable BGP with the multicast address family:

R2:

router bgp 100
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 10.2.3.3 remote-as 100
!
address-family ipv4
no synchronization
no auto-summary
exit-address-family
!
address-family ipv4 multicast
network 1.1.1.1 mask 255.255.255.255
neighbor 10.2.3.3 activate
no auto-summary
exit-address-family

R3:

router bgp 100
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 10.2.3.2 remote-as 100
!
address-family ipv4
no synchronization
no auto-summary
exit-address-family
!
address-family ipv4 multicast
neighbor 10.2.3.2 activate
distance bgp 20 70 1
no auto-summary
exit-address-family

So what did we do here. Well for one thing, we are creating an iBGP peering between R2 and R3.
On this peering we are enable the multiprotocol extension for multicast.

Then on R2, we are advertising the 1.1.1.1/32 network as learned through EIGRP.
A similar configuration is used on R3 except we are not advertising anything and importantly we are using the
distance command to set the administrative distance for iBGP routes to 70. This last step would not be nessecary
if we had an eBGP peering instead, since the AD would then already be lower than the IGP learned route.

On R3, we should now see the following:

R3#sh bgp ipv4 mu
BGP table version is 2, local router ID is 10.3.4.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, x best-external
Origin codes: i - IGP, e - EGP, ? - incomplete

Network          Next Hop            Metric LocPrf Weight Path
*>i1.1.1.1/32       10.1.2.1            156160    100      0 i

and using “show ip rpf 1.1.1.1”:

R3#sh ip rpf 1.1.1.1
RPF information for ? (1.1.1.1)
RPF interface: FastEthernet2/0
RPF neighbor: ? (10.2.3.2)
RPF route/mask: 1.1.1.1/32
RPF type: multicast (bgp 100)
Doing distance-preferred lookups across tables
RPF topology: ipv4 multicast base, originated from ipv4 unicast base

As can be seen, R3 now use the BGP learned route through f2/0 to get “back” to 1.1.1.1/32.
And finally, the RP should be installed on both R3 and R4:

R3#sh ip pim rp map
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
RP 1.1.1.1 (?), v2
Info source: 1.1.1.1 (?), via bootstrap, priority 0, holdtime 150
Uptime: 00:52:22, expires: 00:01:51

R4#sh ip pim rp map
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
RP 1.1.1.1 (?), v2
Info source: 1.1.1.1 (?), via bootstrap, priority 0, holdtime 150
Uptime: 00:52:36, expires: 00:01:36

So thats how we would fix a multicast RPF failure using BGP.

Another Lab lies ahead, round one.

This morning I booked my first go with the CCIE Service Provider lab exam. The battle is in mid November, so I have some time to study.

That also means that alot of forthcomming blog posts will be about CCIE SP material.
I will try and blog about all the tricky features comming my way and provide plenty of examples.

At the moment im very excited and at the same time very humble to have a go at the SP lab exam. When I saw my # the first time around, I never thought I would be going for a 2nd one.

Let the hard work begin.

MPLS VPN Per VRF Label feature

In this post i would like to explain the usage of the “MPLS VPN Per VRF Label” feature.

By default, in each VRF, prefixes are assigned a VPN label, used to identify the route within the VRF itself.
This label is the only label that is being looked at by the receiving PE router.

In theory, you only need a single label to identify the VRF for the destination prefix, and then you can do an IP lookup to further process the packet.
This is what the “MPLS VPN Per VRF Label” does.

Im going to be using the simple topology below to illustrate the functionality:

In this topology, R2 receives 2 prefixes from R1 through RIP.

1.0.0.0/32 is subnetted, 1 subnets
R        1.1.1.1 [120/1] via 192.168.12.1, 00:00:24, FastEthernet1/0
11.0.0.0/32 is subnetted, 1 subnets
R        11.11.11.11 [120/1] via 192.168.12.1, 00:00:24, FastEthernet1/0

Since we are redistributing RIP into BGP, we can see these two routes in the BGP table:

Route Distinguisher: 100:100 (default for vrf VPN_A)
*> 1.1.1.1/32       192.168.12.1             1         32768 ?
*>i4.4.4.4/32       3.3.3.3                  1    100      0 ?
*> 11.11.11.11/32   192.168.12.1             1         32768 ?
*> 192.168.12.0     0.0.0.0                  0         32768 ?
*>i192.168.34.0     3.3.3.3                  0    100      0 ?

And we can check which labels R2 allocates for each of the prefixes:

R2#sh bgp vpnv4 unicast all labels
Network          Next Hop      In label/Out label
Route Distinguisher: 100:100 (VPN_A)
1.1.1.1/32       192.168.12.1    20/nolabel
4.4.4.4/32       3.3.3.3         nolabel/16
11.11.11.11/32   192.168.12.1    21/nolabel
192.168.12.0     0.0.0.0         22/nolabel(VPN_A)
192.168.34.0     3.3.3.3         nolabel/17

For prefix 1.1.1.1/32 a VPN label of 20 has been assigned and for 11.11.11.11/32, label 21 is used.

Lets check the settings for the VRF VPN_A:

R2#sh ip vrf detai
VRF VPN_A (VRF Id = 1); default RD 100:100; default VPNID <not set>
Interfaces:
Fa1/0
VRF Table ID = 1
Export VPN route-target communities
RT:100:100
Import VPN route-target communities
RT:100:100
No import route-map
No export route-map
VRF label distribution protocol: not configured
VRF label allocation mode: per-prefix

We can see that the label allocation mode is “per-prefix” which is what we expect and have verified by looking at the labels assigned by BGP.

The final verification of this can be see on R3:

R3#sh bgp vpnv4 unicast all la
Network          Next Hop      In label/Out label
Route Distinguisher: 100:100 (VPN_A)
1.1.1.1/32       2.2.2.2         nolabel/20
4.4.4.4/32       192.168.34.4    16/nolabel
11.11.11.11/32   2.2.2.2         nolabel/21
192.168.12.0     2.2.2.2         nolabel/22
192.168.34.0     0.0.0.0         17/nolabel(VPN_A)

We can see that the “Out label” for 1.1.1.1/32 is in fact 20 and for 11.11.11.11/32 21. Everything we expect.

Now lets change the setting on R2 to be “Per VRF”:

R2(config)#mpls label mode ALL-vrfs protocol all-afs per-vrf

Verification of the VRF setting:

R2#sh ip vrf detai
VRF VPN_A (VRF Id = 1); default RD 100:100; default VPNID <not set>
Interfaces:
Fa1/0
VRF Table ID = 1
Export VPN route-target communities
RT:100:100
Import VPN route-target communities
RT:100:100
No import route-map
No export route-map
VRF label distribution protocol: not configured
VRF label allocation mode: per-vrf (Label 18)

And verify the BGP label allocation:

R2#sh bgp vpnv4 unicast all labels
Network          Next Hop      In label/Out label
Route Distinguisher: 100:100 (VPN_A)
1.1.1.1/32       192.168.12.1    IPv4 VRF Aggr:18/nolabel
4.4.4.4/32       3.3.3.3         nolabel/16
11.11.11.11/32   192.168.12.1    IPv4 VRF Aggr:18/nolabel
192.168.12.0     0.0.0.0         IPv4 VRF Aggr:18/nolabel(VPN_A)
192.168.34.0     3.3.3.3         nolabel/17

What we can see now, is that the prefixes are being treated as an aggregate with a label of 18 for both prefixes.
Finally, if i look at the changes on R3, we can see that label 18 is indeed being used for all the routes from R2 (and hence R1):

R3#sh bgp vpnv4 unicast all la
Network          Next Hop      In label/Out label
Route Distinguisher: 100:100 (VPN_A)
1.1.1.1/32       2.2.2.2         nolabel/18
4.4.4.4/32       192.168.34.4    16/nolabel
11.11.11.11/32   2.2.2.2         nolabel/18
192.168.12.0     2.2.2.2         nolabel/18
192.168.34.0     0.0.0.0         17/nolabel(VPN_A)

Thats all i had for now. Hope you enjoyed it. Take care!