VxLAN on the CSR1Kv


By now, VxLAN is becoming the standard way of tunneling in the Datacenter.

Using VxLAN, i will show how to use the CSR1Kv to extend your Datacenter L2 reach between sites as well.

First off, what is VxLAN?

It stands for Virtual Extensible LAN. Basically you have a way of decoupling your vlan’s into a new scheme.

You basically map your VLAN into a VNI (Virtual Network Identifier), which in essence makes your VLAN numbering scheme locally significant.

Also, since the numbering for VNI’s is a 24 bit identifier, you have alot more flexibility than just the regular 4096 definable VLAN’s. (12 Bits .1q tags)

Each endpoint that does the encapsulation/decapsulation is called a VTEP (VxLAN Tunnel EndPoint). In our example this would be CSR3 and CSR5.

After the VxLAN header, the packet is further encapsulated into a UDP packet and forwarded across the network. This is a great solution as it doesnt impose any technical restrictions on the core of the network. Only the VTEPs needs to understand VxLAN (and probably have hardware acceleration for it as well).

Since we wont be using BGP EVPN, we will rely solely on multicasting in the network to establish who is the VTEP’s for the traffic in question. The only supported mode is BiDir mode, which is an optimization of the control plane (not the data plane), since it only has (*,G) in its multicast-routing tables.

Lets take a look at the topology i will be using for the example:

 

I have used a regular IOS based device in Site 1 and Site 2, to represent our L2 devices. These could be servers or end-clients for that matter. What i want to accomplish is to run EIGRP between R1 and R2 over the “fabric” using VxLAN as the tunneling mechanism.

CSR3 is the VTEP for Site 1 and CSR5 is the VTEP for Site 2.

In the “fabric” we have CSR4, along with its loopback0 (4.4.4.4/32), which is the BiDir RP and its announcing this using BSR so that CSR3 and CSR4 knows this RP information (along with the BiDir functionality). We are using OSPF as the IGP in the “fabric” to establish routing between the loopback interfaces, which will be the VTEP’s respectively for CSR3 and CSR5.

Lets verify that routing between the loopbacks are working and our RIB is correct:

on CSR3:

CSR3#sh ip route | beg Gate
Gateway of last resort is not set
      3.0.0.0/32 is subnetted, 1 subnets
C        3.3.3.3 is directly connected, Loopback0
      4.0.0.0/32 is subnetted, 1 subnets
O        4.4.4.4 [110/2] via 10.3.4.4, 00:38:27, GigabitEthernet2
      5.0.0.0/32 is subnetted, 1 subnets
O        5.5.5.5 [110/3] via 10.3.4.4, 00:38:27, GigabitEthernet2
      10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
C        10.3.4.0/24 is directly connected, GigabitEthernet2
L        10.3.4.3/32 is directly connected, GigabitEthernet2
O        10.4.5.0/24 [110/2] via 10.3.4.4, 00:38:27, GigabitEthernet2
CSR3#ping 5.5.5.5 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 5.5.5.5, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.3
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/7/22 ms

This means we have full reachability through the “fabric” from VTEP to VTEP.

Lets make sure our multicast routing is working properly and lets take a look at CSR4 first, since its the RP for the network:

CSR4#sh run | incl ip pim|interface
interface Loopback0
 ip pim sparse-mode
interface GigabitEthernet1
 ip pim sparse-mode
interface GigabitEthernet2
 ip pim sparse-mode
interface GigabitEthernet3
interface GigabitEthernet4
ip pim bidir-enable
ip pim bsr-candidate Loopback0 0
ip pim rp-candidate Loopback0 bidir

We can see from this output that we are running PIM on all the relevant interfaces as well as making sure that bidir is enabled. We have also verified that we are indeed running BSR to announce Loopback0 as the RP.

Lets verify the multicast routing table:

CSR4#sh ip mroute | beg Outgoing
Outgoing interface flags: H - Hardware switched, A - Assert winner, p - PIM Join
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode
(*,224.0.0.0/4), 00:45:05/-, RP 4.4.4.4, flags: B
  Bidir-Upstream: Loopback0, RPF nbr: 4.4.4.4
  Incoming interface list:
    GigabitEthernet2, Accepting/Sparse
    GigabitEthernet1, Accepting/Sparse
    Loopback0, Accepting/Sparse
(*, 239.1.1.1), 00:44:03/00:02:46, RP 4.4.4.4, flags: B
  Bidir-Upstream: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet1, Forward/Sparse, 00:44:03/00:02:38
    GigabitEthernet2, Forward/Sparse, 00:44:03/00:02:46
(*, 224.0.1.40), 00:45:05/00:01:56, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Loopback0, Forward/Sparse, 00:45:04/00:01:56

We can see that we do have some (*,G) entries installed (more on the (*, 239.1.1.1) later).

Excellent.

Now lets take a sample from CSR3’s multicast configuration:

CSR3#sh ip pim rp mapping
PIM Group-to-RP Mappings
Group(s) 224.0.0.0/4
  RP 4.4.4.4 (?), v2, bidir
    Info source: 4.4.4.4 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:45:39, expires: 00:02:23

We see that we have learned the RP, its functionality as BiDir and its learned through BSR.

So far so good. Now lets turn our attention to the VxLAN part of the configuration.

The VTEP functionality is implemented by a new interface, called an NVE. This is where the configuration of which source address to use along with the multicast group to use for flooding is defined.

This is the configuration for CSR3:

CSR3#sh run int nve1
Building configuration...
Current configuration : 137 bytes
!
interface nve1
 no ip address
 source-interface Loopback0
 member vni 1000100 mcast-group 239.1.1.1
 no mop enabled
 no mop sysid
end

Whats important here is that we will source our VTEP from loopback0 (3.3.3.3/32) and use multicast group 239.1.1.1 for the VNI 1000100. This number can be whatever you choose, i have just chosen to use a very large number and encode which VLAN this VNI is used for (Vlan 100).

On the opposite side, we have a similar configuration for the NVE:

CSR5#sh run int nve1
Building configuration...
Current configuration : 137 bytes
!
interface nve1
 no ip address
 source-interface Loopback0
 member vni 1000100 mcast-group 239.1.1.1
 no mop enabled
 no mop sysid
end

Its very important that the multicast group matches on both sides as this is the group they will use to flood BUM (Broadcasts, Unknowns and Multicast) traffic. For example ARP.

The next configuration piece is that we need to create an EFP (Ethernet Flow Point) on the interface towards the site routers (R1 and R2) where we accept traffic tagged with vlan 100:

CSR3#sh run int g1
Building configuration...
Current configuration : 195 bytes
!
interface GigabitEthernet1
 no ip address
 negotiation auto
 no mop enabled
 no mop sysid
 service instance 100 ethernet
  encapsulation dot1q 100
  rewrite ingress tag pop 1 symmetric
 !
end

This configuration piece states that the encap is dot1q vlan 100 and to strip the tag inbound before further processing and add it again on egress.

Now for the piece that ties it all together, namely the bridge-domain:

bridge-domain 100
 member vni 1000100
 member GigabitEthernet1 service-instance 100

Here we have a bridge domain configuration where we have 2 members. The local interface G1 on its service instance 100 and our VNI / VTEP. This is basically the glue to tie the bridge domain together end to end.

The same configuration is present on CSR5 as well.

Let verify the control plane on CSR3:

CSR3#sh bridge-domain 100
Bridge-domain 100 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet1 service instance 100
    vni 1000100
   AED MAC address    Policy  Tag       Age  Pseudoport
   0   AABB.CC00.1000 forward dynamic   298  GigabitEthernet1.EFP100
   0   AABB.CC00.2000 forward dynamic   300  nve1.VNI1000100, VxLAN
                                             src: 3.3.3.3 dst: 5.5.5.5

This command will show the MAC addresses learned in this particular bridge domain. On our EFP on G1 we have dynamically learned the MAC address of R1’s interface and through the NVE1 interface using VNI 1000100 we have learned the MAC address of R2. Pay attention to the fact that we know which VTEP endpoints to send the traffic to now. This means that further communication between these two end-hosts (R1 and R2) is done solely using unicast between 3.3.3.3 and 5.5.5.5 using VxLAN as the tunneling mechanism.

CSR3#show nve interface nve 1 detail
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:3.3.3.3 vrf:0)
   Pkts In   Bytes In   Pkts Out  Bytes Out
      3273     268627       3278     269026

This command shows the status of our NVE interface. From this we can see that its in an Up/Up state. The VxLAN port is the standard destination port (4789) and we have some packets going back and forth.

Now that we have everything checked out okay in the control plane, lets see if the data plane is working by issuing an ICMP ping on R1 to R2 (they are obviously on the same subnet (192.168.100.0/24)):

R1#ping 192.168.100.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.100.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/11/26 ms
R1#sh arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.100.1           -   aabb.cc00.1000  ARPA   Ethernet0/0.100
Internet  192.168.100.2           8   aabb.cc00.2000  ARPA   Ethernet0/0.100

This looks excellent! and in fact the EIGRP peering i had setup between them works as well:

R1#sh ip eigrp neighbors
EIGRP-IPv4 Neighbors for AS(100)
H   Address                 Interface              Hold Uptime   SRTT   RTO  Q  Seq
                                                   (sec)         (ms)       Cnt Num
0   192.168.100.2           Et0/0.100                12 04:14:30    4   100  0  3
R1#sh ip route eigrp | beg Gateway
Gateway of last resort is not set
      100.0.0.0/32 is subnetted, 2 subnets
D        100.100.100.2
           [90/409600] via 192.168.100.2, 04:14:46, Ethernet0/0.100

This address is the loopback of R2.

Finally i want to show how the ICMP ping works in the dataplane by doing a capture on CSR4’s G2 interface:

Here we can see a ping i issued on R1’s loopback interface towards R2’s loopback interface.

I have extended the view, so you can see the encapsulation with the VxLAN header running atop the UDP packet.

The UDP packet has the VTEP endpoints (3.3.3.3/32 and 5.5.5.5/32) as the source and destination.

The VNI is what we selected to use and is used for differentiation on the VTEP.

Finally we have our L2 packet in its entirety.

Thats all I wanted to show for now. Next time I will extend this a bit and involve BGP as the control plane.

Thanks for reading!