Category Archives: EFP

VxLAN on the CSR1Kv

By now, VxLAN is becoming the standard way of tunneling in the Datacenter.
Using VxLAN, i will show how to use the CSR1Kv to extend your Datacenter L2 reach between sites as well.

First off, what is VxLAN?
It stands for Virtual Extensible LAN. Basically you have a way of decoupling your vlan’s into a new scheme.

You basically map your VLAN into a VNI (Virtual Network Identifier), which in essence makes your VLAN numbering scheme locally significant.

Also, since the numbering for VNI’s is a 24 bit identifier, you have alot more flexibility than just the regular 4096 definable VLAN’s. (12 Bits .1q tags)

Each endpoint that does the encapsulation/decapsulation is called a VTEP (VxLAN Tunnel EndPoint). In our example this would be CSR3 and CSR5.

After the VxLAN header, the packet is further encapsulated into a UDP packet and forwarded across the network. This is a great solution as it doesnt impose any technical restrictions on the core of the network. Only the VTEPs needs to understand VxLAN (and probably have hardware acceleration for it as well).

Since we wont be using BGP EVPN, we will rely solely on multicasting in the network to establish who is the VTEP’s for the traffic in question. The only supported mode is BiDir mode, which is an optimization of the control plane (not the data plane), since it only has (*,G) in its multicast-routing tables.

Lets take a look at the topology i will be using for the example:

 

I have used a regular IOS based device in Site 1 and Site 2, to represent our L2 devices. These could be servers or end-clients for that matter. What i want to accomplish is to run EIGRP between R1 and R2 over the “fabric” using VxLAN as the tunneling mechanism.

CSR3 is the VTEP for Site 1 and CSR5 is the VTEP for Site 2.

In the “fabric” we have CSR4, along with its loopback0 (4.4.4.4/32), which is the BiDir RP and its announcing this using BSR so that CSR3 and CSR4 knows this RP information (along with the BiDir functionality). We are using OSPF as the IGP in the “fabric” to establish routing between the loopback interfaces, which will be the VTEP’s respectively for CSR3 and CSR5.

Lets verify that routing between the loopbacks are working and our RIB is correct:

on CSR3:

CSR3#sh ip route | beg Gate
Gateway of last resort is not set

      3.0.0.0/32 is subnetted, 1 subnets
C        3.3.3.3 is directly connected, Loopback0
      4.0.0.0/32 is subnetted, 1 subnets
O        4.4.4.4 [110/2] via 10.3.4.4, 00:38:27, GigabitEthernet2
      5.0.0.0/32 is subnetted, 1 subnets
O        5.5.5.5 [110/3] via 10.3.4.4, 00:38:27, GigabitEthernet2
      10.0.0.0/8 is variably subnetted, 3 subnets, 2 masks
C        10.3.4.0/24 is directly connected, GigabitEthernet2
L        10.3.4.3/32 is directly connected, GigabitEthernet2
O        10.4.5.0/24 [110/2] via 10.3.4.4, 00:38:27, GigabitEthernet2

CSR3#ping 5.5.5.5 so loo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 5.5.5.5, timeout is 2 seconds:
Packet sent with a source address of 3.3.3.3 
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/7/22 ms

This means we have full reachability through the “fabric” from VTEP to VTEP.

Lets make sure our multicast routing is working properly and lets take a look at CSR4 first, since its the RP for the network:

CSR4#sh run | incl ip pim|interface
interface Loopback0
 ip pim sparse-mode
interface GigabitEthernet1
 ip pim sparse-mode
interface GigabitEthernet2
 ip pim sparse-mode
interface GigabitEthernet3
interface GigabitEthernet4
ip pim bidir-enable
ip pim bsr-candidate Loopback0 0
ip pim rp-candidate Loopback0 bidir

We can see from this output that we are running PIM on all the relevant interfaces as well as making sure that bidir is enabled. We have also verified that we are indeed running BSR to announce Loopback0 as the RP.

Lets verify the multicast routing table:

CSR4#sh ip mroute | beg Outgoing   
Outgoing interface flags: H - Hardware switched, A - Assert winner, p - PIM Join
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*,224.0.0.0/4), 00:45:05/-, RP 4.4.4.4, flags: B
  Bidir-Upstream: Loopback0, RPF nbr: 4.4.4.4
  Incoming interface list:
    GigabitEthernet2, Accepting/Sparse
    GigabitEthernet1, Accepting/Sparse
    Loopback0, Accepting/Sparse

(*, 239.1.1.1), 00:44:03/00:02:46, RP 4.4.4.4, flags: B
  Bidir-Upstream: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    GigabitEthernet1, Forward/Sparse, 00:44:03/00:02:38
    GigabitEthernet2, Forward/Sparse, 00:44:03/00:02:46

(*, 224.0.1.40), 00:45:05/00:01:56, RP 0.0.0.0, flags: DCL
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Loopback0, Forward/Sparse, 00:45:04/00:01:56

We can see that we do have some (*,G) entries installed (more on the (*, 239.1.1.1) later).
Excellent.

Now lets take a sample from CSR3’s multicast configuration:

CSR3#sh ip pim rp mapping 
PIM Group-to-RP Mappings

Group(s) 224.0.0.0/4
  RP 4.4.4.4 (?), v2, bidir
    Info source: 4.4.4.4 (?), via bootstrap, priority 0, holdtime 150
         Uptime: 00:45:39, expires: 00:02:23

We see that we have learned the RP, its functionality as BiDir and its learned through BSR.

So far so good. Now lets turn our attention to the VxLAN part of the configuration.

The VTEP functionality is implemented by a new interface, called an NVE. This is where the configuration of which source address to use along with the multicast group to use for flooding is defined.

This is the configuration for CSR3:

CSR3#sh run int nve1
Building configuration...

Current configuration : 137 bytes
!
interface nve1
 no ip address
 source-interface Loopback0
 member vni 1000100 mcast-group 239.1.1.1
 no mop enabled
 no mop sysid
end

Whats important here is that we will source our VTEP from loopback0 (3.3.3.3/32) and use multicast group 239.1.1.1 for the VNI 1000100. This number can be whatever you choose, i have just chosen to use a very large number and encode which VLAN this VNI is used for (Vlan 100).

On the opposite side, we have a similar configuration for the NVE:

CSR5#sh run int nve1
Building configuration...

Current configuration : 137 bytes
!
interface nve1
 no ip address
 source-interface Loopback0
 member vni 1000100 mcast-group 239.1.1.1
 no mop enabled
 no mop sysid
end

Its very important that the multicast group matches on both sides as this is the group they will use to flood BUM (Broadcasts, Unknowns and Multicast) traffic. For example ARP.

The next configuration piece is that we need to create an EFP (Ethernet Flow Point) on the interface towards the site routers (R1 and R2) where we accept traffic tagged with vlan 100:

CSR3#sh run int g1
Building configuration...

Current configuration : 195 bytes
!
interface GigabitEthernet1
 no ip address
 negotiation auto
 no mop enabled
 no mop sysid
 service instance 100 ethernet
  encapsulation dot1q 100
  rewrite ingress tag pop 1 symmetric
 !
end

This configuration piece states that the encap is dot1q vlan 100 and to strip the tag inbound before further processing and add it again on egress.

Now for the piece that ties it all together, namely the bridge-domain:

bridge-domain 100 
 member vni 1000100
 member GigabitEthernet1 service-instance 100

Here we have a bridge domain configuration where we have 2 members. The local interface G1 on its service instance 100 and our VNI / VTEP. This is basically the glue to tie the bridge domain together end to end.

The same configuration is present on CSR5 as well.

Let verify the control plane on CSR3:

CSR3#sh bridge-domain 100
Bridge-domain 100 (2 ports in all)
State: UP                    Mac learning: Enabled
Aging-Timer: 300 second(s)
    GigabitEthernet1 service instance 100
    vni 1000100
   AED MAC address    Policy  Tag       Age  Pseudoport
   0   AABB.CC00.1000 forward dynamic   298  GigabitEthernet1.EFP100
   0   AABB.CC00.2000 forward dynamic   300  nve1.VNI1000100, VxLAN 
                                             src: 3.3.3.3 dst: 5.5.5.5

This command will show the MAC addresses learned in this particular bridge domain. On our EFP on G1 we have dynamically learned the MAC address of R1’s interface and through the NVE1 interface using VNI 1000100 we have learned the MAC address of R2. Pay attention to the fact that we know which VTEP endpoints to send the traffic to now. This means that further communication between these two end-hosts (R1 and R2) is done solely using unicast between 3.3.3.3 and 5.5.5.5 using VxLAN as the tunneling mechanism.

CSR3#show nve interface nve 1 detail 
Interface: nve1, State: Admin Up, Oper Up, Encapsulation: Vxlan,
BGP host reachability: Disable, VxLAN dport: 4789
VNI number: L3CP 0 L2DP 1
source-interface: Loopback0 (primary:3.3.3.3 vrf:0)
   Pkts In   Bytes In   Pkts Out  Bytes Out
      3273     268627       3278     269026

This command shows the status of our NVE interface. From this we can see that its in an Up/Up state. The VxLAN port is the standard destination port (4789) and we have some packets going back and forth.

Now that we have everything checked out okay in the control plane, lets see if the data plane is working by issuing an ICMP ping on R1 to R2 (they are obviously on the same subnet (192.168.100.0/24)):

R1#ping 192.168.100.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.100.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/11/26 ms
R1#sh arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  192.168.100.1           -   aabb.cc00.1000  ARPA   Ethernet0/0.100
Internet  192.168.100.2           8   aabb.cc00.2000  ARPA   Ethernet0/0.100

This looks excellent! and in fact the EIGRP peering i had setup between them works as well:

R1#sh ip eigrp neighbors 
EIGRP-IPv4 Neighbors for AS(100)
H   Address                 Interface              Hold Uptime   SRTT   RTO  Q  Seq
                                                   (sec)         (ms)       Cnt Num
0   192.168.100.2           Et0/0.100                12 04:14:30    4   100  0  3

R1#sh ip route eigrp | beg Gateway
Gateway of last resort is not set

      100.0.0.0/32 is subnetted, 2 subnets
D        100.100.100.2 
           [90/409600] via 192.168.100.2, 04:14:46, Ethernet0/0.100

This address is the loopback of R2.

Finally i want to show how the ICMP ping works in the dataplane by doing a capture on CSR4’s G2 interface:

Here we can see a ping i issued on R1’s loopback interface towards R2’s loopback interface.
I have extended the view, so you can see the encapsulation with the VxLAN header running atop the UDP packet.
The UDP packet has the VTEP endpoints (3.3.3.3/32 and 5.5.5.5/32) as the source and destination.

The VNI is what we selected to use and is used for differentiation on the VTEP.
Finally we have our L2 packet in its entirety.

Thats all I wanted to show for now. Next time I will extend this a bit and involve BGP as the control plane.
Thanks for reading!

Snippet: The story of the EFP

For a while now, the concept of EVC’s (Ethernet Virtual Circuits) and EFP’s (Ethernet Flow Points), has eluded me.

In this short post, i will provide you with a simple example of a couple of EFP’s. In a later post i will discuss the MEF concept of EVC’s.

As always, here is the topology i will be using:

topology

Its a very simple setup. R1 connects to R2 through its G1 interface and connects to R3 through its G2 interface.

On R2 and R3, we have the very common configuration of using subinterfaces for the individual Vlan’s in question. Namely Vlan 10 for the connection between R1 and R2 and Vlan 20 between R1 and R3.

Here is the configuration of R2 and R3:

R2#sh run int g1.10
Building configuration...
Current configuration : 98 bytes
!
interface GigabitEthernet1.10
encapsulation dot1Q 10
ip address 10.10.10.2 255.255.255.0
end

R3#sh run int g1.20
Building configuration...
Current configuration : 98 bytes
!
interface GigabitEthernet1.20
encapsulation dot1Q 20
ip address 10.10.10.3 255.255.255.0
end

Now on R1 is where the “different” configuration takes place:

R1#sh run int g1
Building configuration...
Current configuration : 182 bytes
!
interface GigabitEthernet1
no ip address
negotiation auto
service instance 10 ethernet
encapsulation dot1q 10
rewrite ingress tag pop 1 symmetric
bridge-domain 10
!
end

R1#sh run int g2
Building configuration...
Current configuration : 182 bytes
!
interface GigabitEthernet2
no ip address
negotiation auto
service instance 20 ethernet
encapsulation dot1q 20
rewrite ingress tag pop 1 symmetric
bridge-domain 10
!
end

R1#sh run int bdi10
Building configuration...
Current configuration : 96 bytes
!
interface BDI10
description -= Our L3 interface =-
ip address 10.10.10.1 255.255.255.0
end

So what does this all mean!? – Well, basically what you are looking at is the very nature of an EFP. One on each physical interface in this case. It is defined under the “service instance” command structure.

An Ethernet Flow Point (EFP) is a way to match a certain ethernet frame, do an action on it ingress (and also in our case egress). On top of that you can attach it to a bridge-domain.

The result of the above configuration is that on G1, we match on the dot1q tag when its tagged with vlan 10. On ingress we then pop 1 tag before performing any other “upstream” action. With the symmetric keyword, we attach the vlan 10 tag when egressing.

On G2, we are doing the same, but with vlan 20 instead.

With both EFP’s we attach a bridge-domain (ID 10), which can be verified like this:

R1#show bridge-domain
Bridge-domain 10 (3 ports in all)
State: UP Mac learning: Enabled
Aging-Timer: 300 second(s)
BDI10 (up)
GigabitEthernet1 service instance 10
GigabitEthernet2 service instance 20
AED MAC address Policy Tag Age Pseudoport
- 001E.7AE0.11BF to_bdi static 0 BDI10

Right now we only have one mac address learned, namely of our L3 BDI interface. But we can see that both G1 and G2 has a service instance in this bridge-domain.

Lets try and do some ICMP tests from R2:

R2#ping 10.10.10.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.10.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/120/598 ms

Lets again verify our bridge-domain on R1:

R1#show bridge-domain
Bridge-domain 10 (3 ports in all)
State: UP Mac learning: Enabled
Aging-Timer: 300 second(s)
BDI10 (up)
GigabitEthernet1 service instance 10
GigabitEthernet2 service instance 20
AED MAC address Policy Tag Age Pseudoport
- 001E.7AE0.11BF to_bdi static 0 BDI10
0 0050.56BE.18D8 forward dynamic 276 GigabitEthernet1.EFP10

What we see now, is that a Mac address has been dynamically learned through the G1.EFP10 EFP.

Since we are technically “bridging” these two distinct vlans, we should be able to ping R3 from R2 as well:

R2#ping 10.10.10.3
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.10.3, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 2/24/104 ms

And again on R1:

R1#show bridge-domain
Bridge-domain 10 (3 ports in all)
State: UP Mac learning: Enabled
Aging-Timer: 300 second(s)
BDI10 (up)
GigabitEthernet1 service instance 10
GigabitEthernet2 service instance 20
AED MAC address Policy Tag Age Pseudoport
- 001E.7AE0.11BF to_bdi static 0 BDI10
0 0050.56BE.320A forward dynamic 287 GigabitEthernet2.EFP20
0 0050.56BE.18D8 forward dynamic 287 GigabitEthernet1.EFP10

We have now learned all the Mac addresses in our small test environment.

So thats basically all there is to an EFP. A simple way of providing a flexible way of matching frames.

Until next time! – Take care.