Spanning Tree (802.1D) – Part 1


Ive spent the last couple of days playing around with the traditional Spanning-tree protocol (802.1D), which has been used for many years, but is pretty slow to converge.

As most of you know, Spanning-tree protocol (STP), is used to build a loop-free L2 topology. This is done to avoid bridging loops, where your frames gets sent around and around endlessly.

STP does this by assigning ports a certain role and a state. It uses BPDU’s, bridge protocol Data Units, to communicate with other switches.

The first task of STP is to elect a root-bridge (switch) of the network, which is a central point in the L2 network. From a design perspective, its placement is very important as it will handle most of the traffic load, and links toward it will be put into a state where data passes through. So make sure the switch is beefy and correctly placed in your network.

The root-bridge is elected by use of two things passed along in the BPDU. Namely, priority and Mac address. Priority takes precedence, and the Mac address will be used as a tie-breaker.

The Bridge ID format comes in two varieties, the old style and the new style. Originally the Bridge ID was 8 bytes in total, with the first 2 bytes being the priority and the remaining 6 bytes the mac address. The new format takes the first 4 bits to the priority field, and the next 12 bits to be the VLAN ID, the last 6 is still the mac address. This way a BPDU can be seperated to which vlan it belongs to.

Root Port:

When the root bridge has been aggreed upon by all switches, each switch needs to find the optimal way to reach the root. This is done by comparing the BPDU’s received on all interfaces. The BPDU which will result in the least interface cost, becomes the best path to the root, this port is called the root-port. So the cost you ask, what is the cost? Each interface with a certain bandwidth has a cost in STP. By the new definition a 100 Mbit interface has a cost of 19. A gigabit interface a cost of 4, 10Gbit is 2, and if there’s still a 10Mbit, that has a cost of 100.

So each switch sends out to its neighbors, what its own cost to reach the root is, and the neighbor then adds its cost to that neighbor. As said, the one with the least cost, becomes the root port. The root port, is always in a forwarding state, which means it can pass data through the port.

Other Ports:

What about the other ports? The other ports can either be a designated port, or a blocking port. But thats more their state. A designated port, is a port that forwards data onto a link-segment because it had the best cost to reach the root. How is this selected? Well, either by the best cost to the root, or the best priority of the switch sending the BPDU, or by Mac address as a tie-breaker. The switch that wins on that segment puts its interface into a forwarding state. The one that looses, puts its interface into a blocking state, which means that the port will not pass any data. This is where the bridging loop is broken. By selecting some ports to block.

On the root bridge, all ports are designated, and hence in a forwarding state. This is a direct result of being the root-bridge.

Port States:

A port can be in a single state at any given time. These states are:

The disabled state, simply means that the port is not active in the STP. Most likely because it have been shut down. The blocking state is the state a port will reach if it is not a root port, and not a designated port either. A blocking port does not forward data frames, nor does it send BPDU’s. All it does is listen to BPDU’s, listening for any topology change.

The listening state is the state a port will enter when there has been a topology change. In this state the port sends and receives BPDU’s to determine root port and designated port on a segment. No data frames are forwarded in this state.

The learning state is the next-to last steps in order for a port to enter a forwarding state. In this state the port is listening to data frames, but not actually forwarding them. It does this in order to populate its CAM table, or mac-address-table, so that it wont flood more than nessecary when it starts forwarding frames.

In the last state, forwarding, we have a fully functional port, that is forwarding data frames, and all is working as it should. It relays BPDU’s received from the root on the root port, onto any designated ports, and if its the root port it just listens for these BPDU’s.

Timers:

Lets talk a bit about the timers that STP uses in order to complete its mission:

The first one is the Max-Age timer, and to be honest, this one has really given me a lot of grief. What I can cook it down to, is how long a port will keep the best BPDU it has learned on this port. So every time the best BPDU goes through a switchport, it resets the Max-Age timer. This becomes important when you start learning inferior BPDU’s on the port, as I will demonstrate later. The Max-Age timer is 20 seconds by default. This value is propagated through BPDU’s from the root.

The second timer, is called the forward-delay. This delay is being used when going from one state to another. Namely from the listening state to the learning state, and the learning state to the forwarding state. This timer is also set on the root switch, and propagated down the spanning-tree. Defaults to 15 seconds.

So lets take a look at a topology to discuss STP’s behavior:

As you can see, we have 4 switches in this topology, with the links shown. I have marked each link being L1, L2 and so forth. Also note several other things, The cost of each link is also being depicted along with the port role of each port on each device.

In this topology Switch A has been selected the root bridge. This could be a direct result of it having the best priority, or simply the lowest MAC address. Either way, it has become the root switch, and the one all the other switches will reach somehow. Remember that all ports on the root bridge are designated ports.

Next the 3 switches have determined their root-port, marked as R in the topology drawing. This is a direct result of being the lowest cost path to reach the root. Lets examine this in further detail to clarify the root port selection. Lets take SWB as an example. When SWB first knows that SWA is the root switch, it has two ports to reach it through, if you look at the drawing, it can reach it directly through L1 or indirectly through L2 to SWC and then from SWC on L4 to SWB. On L1, the root bridge will send out a BPDU stating a cost of 0. This is natural since it doesnt cost anything to reach itself. SWB will receive this BPDU and add the cost of the link to the total path cost, which is now 19. SWB performs the same action for the BPDU being received from SWC. SWC also receives  BPDU with a cost of 0, adds 19 to the BPDU and relays it to SWB, SWB then adds 19 on top of that and the result is 38. So which is better, 19 or 38? Alas, it chooses L1 as its root port. This same procedure occurs on all three non-root switches, and ends up in them all figuring out their root ports.

The next big thing, is the designated port. One designated port is selected on any given link. The port sends out BPDU’s and forwards data (It is up to the “other end” to block the data). Lets proceed using SWB and SWC as our example. When each of them has determined their root port, they send out the relayed BPDU onto the L4. So what port becomes the designated port? Well, there is a selection like in all things Cisco 🙂

What this means is that the lowest Bridge ID, becomes the switch with the designated port. In this case it happened to become SWC (maybe its an older switch with a lower MAC address. Its irrelevant for further discussion). The same thing happened on L5 where SWD became the switch with the designated port.

All other ports, transition to the Blocking state, and shuts down for forwarding data. This is how the bridging loop is stopped.

Now we have a converged L2 network. All the links that should be up are up, and all ports that should be blocking are blocking. In my next post I will discuss certain topology change scenarios from this same L2 topology. Discussing directly connected link failure and loss of BPDU’s in the network and how that affects convergence. Stay tuned.