PIM Sparse Mode

Protocol Independent Multicast - Sparse-Mode (PIM-SM) is a protocol for efficiently routing to multicast groups that may span wide-area (and inter-domain) internets. This protocol is named protocol independent because it is not dependent on any particular unicast routing protocol for topology discovery, and sparse-mode because it is suitable for groups where a very low percentage of the nodes (and their routers) will subscribe to the multicast session. Unlike earlier dense-mode multicast routing protocols such as DVMRP and PIM-DM which flooded packets everywhere and then pruned off branches where there were no receivers, PIM-SM explicitly constructs a tree from each sender to the receivers in the multicast group. Multicast packets from the sender then follow this tree..

Protocol overview

Multi-cast clients

A router receives explicit Join/Prune messages from those neighboring routers that have downstream group members.

In order to join a multi-cast group, G, a host conveys its membership information through the Internet Group Management Protocol (IGMP).
The router then forwards data packets addressed to a multi-cast group G to only those interfaces on which explicit joins have been received.
A Designated Router (DR) sends periodic Join/Prune messages toward a group-specific Rendezvous Point (RP) for each group for which it has active members.
- Note that one router will be automatically or statically designated as the rendezvous point (RP), and all routers must explicitly join through the RP.
  how do RP know which MC groups they are the RP for? are static RPs used? or is auto designated more common?
Each router along the path toward the RP builds a wild card (any-source) state for the group and sends Join/Prune messages on toward the RP.
- The term route entry is used to refer to the state maintained in a router to represent the distribution tree.
- A route entry may include such fields as:
  - source address
  - the group address
  - the incoming interface from which packets are accepted
  - the list of outgoing interfaces to which packets are sent
  - timers, flag bits, etc.
- The wild card route entry's incoming interface points toward the RP
- The outgoing interfaces point to the neighboring downstream routers that have sent Join/Prune messages toward the RP.
This state creates a shared, RP-centered, distribution tree that reaches all group members.

Multi-cast sources

When a data source first sends to a group, its Designate Router (DR) uni-casts Register messages to the Rendezvous Point (RP) with the source's data packets encapsulated within.
If the data rate is high, the RP can send source-specific Join/Prune messages back towards the source and the source's data packets will follow the resulting forwarding state and travel un-encapsulated to the RP.
Whether they arrive encapsulated or natively, the RP forwards the source's de-capsulated data packets down the RP-centered distribution tree toward group members.
If the data rate warrants it, routers with local receivers can join a source-specific, shortest path, distribution tree, and prune this source's packets off of the shared RP-centered tree.
For low data rate sources, neither the RP, nor last-hop routers need join a source-specific shortest path tree and data packets can be delivered via the shared RP-tree.

Once the other routers which need to receive those group packets have subscribed, the RP will unsubscribe to that multi-cast group, unless it also needs to forward packets to another router or node. Additionally, the routers will use reverse-path forwarding to ensure that there are no loops for packet forwarding among routers that wish to receive multi-cast packets.

Introduction Some of the long-time and very experienced folks that founded Chesapeake Computer Consultants (which grew into Mentor) are starting a new company. The new company is named Chesapeake Netcraftsmen, and our web page is at http://www.netcraftsmen.net . (Yes, please watch out for the ".net", also that's "men" plural). We're going to be focused on consulting (mostly, with the occasional special training class). We have a core group of 6+ CCIE's, a couple of other very savvy people, and tons of experience. (If you remember "Chesapeake Computer Consultants" you're probably experienced too.)

We've been talking about IP Multicast. This month I'm going to start covering PIM Sparse Mode. Previous articles that might be of interest:

The Protocols of IP Multicast, http://www.netcraftsmen.net/welcher/papers/multicast01.html PIM Dense Mode, http://www.netcraftsmen.net/welcher/papers/multicast02.html Sparse Versus Dense Mode Recall that PIM Dense Mode is used (in principle) when the multicast is desired in most locations. Thus initial multicast packets are flooded everywhere, with pruning cutting off traffic to locations that do not need the multicast feed. Until recently, PIM Dense Mode suffered from periodic re-flooding every 3 minutes, but in 12.1(5)T, the PIM Dense Mode State Refresh feature alleviated this. With this feature, PIM Dense Mode is arguably suitable for simple implementation of multicast. Especially where the additional control of PIM Sparse Mode is not needed, and where occasional "accidental" flooding would not be very harmful. PIM Sparse Mode uses an explicit request approach, where a router has to ask for the multicast feed with a PIM Join message. PIM Sparse Mode is indicated when you need more precise control, especially when you have large volumes of IP multicast traffic compared to your bandwidth. PIM Sparse Mode scales rather well, because packets only go where they are needed, and because it creates state in routers only as needed. Because of this, it has been written up as an Internet Experimental Protocol. See http://www.ietf.org/rfc/rfc2362.txt .

The price we pay for this extra control is mild extra complexity. PIM Sparse Mode uses a special router called a Rendezvous Point (RP) to connect the flow source or multicast tree to the router next to the wannabe receiver. The RP is typically used only temporarily, as we'll see below.

There can be different RP's for different multicast groups, which is one way to spread the load. There is usually one RP per multicast gropu. Redundancy of RP's is an advanced topic, and requires a little deeper expertise. One way to do this is with the MSDP protocol (possible later article in the series).

Recall that a PIM Join message is sent towards a Source (or for PIM-SM, possibly towards an RP), based on unicast routing. The Join message says in effect "we need a copy of the multicasts over here". It connects the sender of the Join and intervening routers to any existing multicast tree, all the way back to the target of the Join if necessary. A Prune message says in effect "we no longer need this over here". A router receiving a Prune sees whether it has any other interfaces requiring the multicast flow, and if not, sends its own Prune message. One advanced technique is to arrange a separate and perhaps different copy of the unicast routing information just for multicast purposes. This allows "steering" of the Join messages. MultiProtocol BGP, MBGP, for multicast, is one way to do this (possible later article in the series).

Basic Rendezvous Point (RP) We've seen so far that PIM-SM uses a Rendezvous Point (RP), to connect source and receivers. There can be only one RP per multicast group, and the simplest implementation uses one RP for all the multicast groups. Let's talk through the basics of how the RP is used. Let's assume the source starts sending before there any receivers. If things happen the other way around, some of the details change slightly, but it's not very different.

So: the multicast source starts sending. As we've already noted, there is no protocol or anything for registering sources with IP multicast. The source sends and it is up to the neighboring router(s) to do the right thing. With PIM-SM, the neighboring router knows about the RP. (How it knows is a topic for a whole separate article.) The neighboring router forwards the multicast data to the RP by encapsulating it in a unicast Register message or messages. Normal routing delivers the Register to the RP. The RP de-encapsulates the multicast and forwards copies down any Shared Tree (there is one pre-built if there were receivers Joined up before the Source started sending). If there are receivers (Shared Tree state outbound interfaces), the RP sends a PIM Join back towards the Source. This connects the Source to the RP with a Source Tree, the (S, G) Shortest Path Tree (SPT). Once the RP receives multicasts along this SPT, it sends a Register-Stop to tell the router by the Source to stop sending Register packets. The reason for this behavior is that no multicast packets are lost, if there are receivers already present.

By the way, if there are no receivers present, the Register-Stop message is sent. Then when a receiver subsequently shows up (IGMP to neighbor router, PIM Join from neighbor router back to RP), then the RP sends the PIM Join to the Source at that time.

The following figure assumes there is a source and active receivers (not shown). The shown receiver sends a IGMP Report to router D. Router D then sends a PIM Join towards the RP. Since there are other receivers, the RP is already joined to the Source Tree (shown in blue) and is receiving the multicast flow. It passes the Source Tree flow packets on via the Shared Tree, shown in green.

Well, now we've got the packets going from the Source to the RP along the Source Tree (Shortest Path Tree, SPT), and from the RP to the receiver along the Shared Tree. When the aggregate (*, G) packet bit rate (from all sources) exceeds a threshold in Kbps, this triggers the router nearest the receiver to try to join the Source Tree. It sends a Join towards the source of the multicast flow. Note that the prior Join it sent was towards the RP. The Join towards the source goes router by router towards the Source until it encounters a router that is already in the Source Tree. This adds the router near the receiver to the Source Tree. When a packet is actually received along that tree, a Prune is sent towards the RP. In effect, "thanks, but I'm now getting my multicast wholesale, not retail", since this process cuts out the RP in the middle.

The following figure shows how this works. The top left red arrows show the Join towards the Source. This gets the top blue flow going, packets being forwarded along the Source Tree. The lower right red arrows then are the Prunes, since the Shared Tree flow is no longer needed (shown as green dashed line). Note that the Source Tree packets arrive at the Receiver along a more direct path, generally with lower latency.

By the way, we control the threshold. It is configurable. Default is zero Kbps: receive one packet, and switch over to Source Tree. If we have many sources for a particular multicast group (think conference call, VoIP), then there is a (S, G) Source Tree entry for each one. If we set the threshold to never activate, then all packets go through the RP (sort of like a conference calling bridge), using only the (*, G) Shared Tree. The threshold is also used for switchback as well as switchover. Low rate (S, G) Source Trees are switched back over to the Shared Tree. The volume of traffic is checked every minute.

If a receiver wishes to join, and its neighbor router is on the SPT (Source Tree), then the outgoing interface Shared Tree entry is copied to the Source Tree entry, which protects against having to send traffic to the RP and then "back" to the router on the SPT.

By the way, you may be wondering, what is the point of having the RP here? Because of the threshold mechanism, the RP gives us a way to use the Shared Tree, and control the explosive creation of state information in routers if many receivers join at the same time.

Shared Versus Source Trees PIM Sparse Mode (PIM-SM) can use both Shared Trees (passing through the RP) and Source Trees (for efficient direct delivery along the "shortest" path from source to receiver). Typically it can use both. If efficient delivery is less important to you, and decreasing the amount of state information kept by the routers is more important, then PIM can be configured to just use a Shared Tree. When a PIM-SM router receives a multicast packet, it checks the Source Tree for that particular source address and multicast group (destination) address. If there is no entry present, it then checks for a Shared Tree (*, G) entry for the multicast group. If entries are present for both trees, the inbound interface tells the router which tree to use. If both trees have the same inbound interface, then the RP bit for an (S, G) entry prevents duplicate packets: this indicates that the RPF interface lies along the Shared Tree.

For a multicast flow with at least one active receiver, the path between the source and the RP will be part of the Source Tree. (Note that "the path between" is a bit vague here, I'm trying to stay away from giving too much detail.)

Shared Tree entries will connect the RP to some of the receivers. The RPF interface for the (*, G) Shared Tree is the interface in the direction of the RP, not the multicast group source. That why there is the possibility of the (S, G) Source and (*, G) Shared Trees having different RPF interfaces.

The Shared Tree (*, G) entries show interfaces where a join to the RP was received, or interfaces with directly connected group members (configured or IGMP received). The Source Tree (S, G) entries show where a Join or a Prune or a Register was received.

References