Multicast Troubleshooting
Tutorial
Caren Litvanyi
litvanyi@grnoc.iu.edu
Joint Techs Meeting
Salt Lake City, Utah
February 2005
Tutorial Outline
|
|
|
Review IP multicast terminology and
basic functionality. |
|
Review how the most common multicast
protocols in use today work. |
|
Discuss some design issues. |
|
Troubleshooting multicast methodology,
particularly interdomain multicast. |
|
Mention some tools and resources. |
Multicast Functionality and
Terminology
Unicast vs. Multicast
Multicast Building Blocks
|
|
|
|
The SENDERS send without worrying about
receivers. |
|
Packets are sent to a multicast
address. |
|
(224.0.0.0 - 239.255.255.255) |
|
The RECEIVERS inform their local
routers what they want to receive. |
|
The routers build a tree backwards
(reverse-path) towards the source, thus making sure the STREAMS make it to
the correct receiving networks. |
Essential Multicast
Terminology
|
|
|
|
A few things to note here: |
|
The IP source address is the IP address
of the server |
|
BUT – the destination address in the
packet is NOT an IP address of a receiver.
It is a multicast IP address. |
|
224.0.0.0 - 239.255.255.255 |
|
tree = the path taken by multicast
data. Routing loops are not allowed, so there is always a unique series of
branches between the root of the tree and the receivers. |
|
|
(S,G) notation
|
|
|
|
For every multicast stream there must
be two pieces of information: the source IP address, S, and the group
address, G. |
|
These correspond to the sender and
receiver addresses in unicast. |
|
This is generally expressed as (S,G). |
|
Also commonly used is (*,G) - every
source for a particular group. |
|
|
Multicast Addressing
|
|
|
|
|
RFC 3171 244.0.0.0 – 239.255.255.255 |
|
Examples of Reserved & Link-local
Addresses |
|
224.0.0.0 - 224.0.0.255 reserved &
not forwarded |
|
224.0.0.1 - All local hosts |
|
224.0.0.2 - All local routers |
|
224.0.0.4 - DVMRP |
|
224.0.0.5 - OSPF |
|
224.0.0.6 - Designated Router OSPF |
|
224.0.0.9 - RIP2 |
|
224.0.0.13 - PIM |
|
224.0.0.18 - VRRP |
|
224.0.0.22 - All IGMP routers |
|
239.0.0.0 - 239.255.255.255
Administrative Scoping |
|
232.0.0.0 – available for SSM use |
|
“Ordinary” multicasts don’t have to
request a multicast address
from IANA. Use GLOP space – RFC
2770. |
Essential Multicast
Protocols
|
|
|
Group Management Protocol - enables
hosts to dynamically join/leave multicast groups. Receivers send group
membership reports to the nearest router. |
|
Multicast Routing Protocol - enables
routers to build a delivery tree backwards from the receivers to the sender of a multicast
stream. |
|
|
Multicast Protocol Summary
|
|
|
|
Essential Protocols |
|
IGMP - Internet Group Management
Protocol is used by hosts and routers to tell each other about group
membership. (Usually version 2) |
|
PIM-SM - Protocol Independent Multicast
- Sparse Mode is used to propagate forwarding state between routers. |
|
Other Protocols (for interdomain) |
|
MBGP - Multiprotocol Border Gateway
Protocol is used to exchange routing information for inter-domain
reverse-path forwarding (RPF) checking. |
|
MSDP - Multicast Source Discovery
Protocol is used to exchange active-source information. |
IGMP Protocol Flow - Join a
Group
|
|
|
Router triggers group membership
request to PIM. |
|
Hosts can send unsolicited Join membership
messages – called reports in the RFC (usually more than 1) |
|
Or hosts can join by responding to
periodic query from router |
|
|
IGMPv2
|
|
|
|
|
Router: |
|
sends Membership Query messages to All
Hosts (224.0.0.1) |
|
default query-interval = 125 seconds |
|
router with lowest IP address is
Querier (rest non-queriers) |
|
If lower-IP address query heard, back
off to non-querier state |
|
Other Querier Present Interval default:
(robust-count x query-interval) + (0.5 x query-response-interval) = 255
seconds |
|
listens for reports (whether querier or
not) and adds group to membership list for that interface |
|
default query-response-interval = 10
seconds |
|
timeout (Group member interval)
default: |
|
(robust-count x query-interval) + (1 x
query-response-interval) = 260
seconds |
|
robust-count - provides fine-tuning to
allow for expected packet loss on a subnet. Default = 2 (tunable from 2-10) |
|
Triggers group membership request to
PIM. |
|
|
|
|
|
|
IGMPv2
|
|
|
|
|
Host: |
|
responds to router query with
Membership Report messages to groups it is a member of (e.g.224.10.8.5) |
|
waits 0-10 sec (default; specified in
Query) |
|
Hosts listen to other host reports |
|
Only 1 host responds. Others become
“idle-members.” |
|
sends unsolicited Membership Reports
(i.e., Join Messages) to group address (e.g. 224.10.8.5) |
|
sends Leave messages to All Routers
(224.0.0.2) |
|
reports group membership ONLY – no
sources. |
|
Only the existence of local group
members is known, not the actual members themselves (due to idle-member
state). |
IGMP Protocol Flow - Querier
|
|
|
|
|
Hosts respond to query to indicate (new
or continued) interest in group(s) |
|
only one host should respond per group |
|
Hosts fall into idle-member state when
same-group report heard. |
|
After 260 sec with no response, router
times out group. |
IGMP Protocol Flow - Leave a
Group
|
|
|
|
Hosts that support IGMPv2 send Leave
messages to all-routers group indicating group they’re leaving. |
|
Router follows up with 2 group-specific
query messages. |
|
IGMPv1 hosts leave by not responding to
queries (260 sec timeout). |
Switches and Snooping
|
|
|
IGMP host reports (Joins) tell the
router to start sending multicast traffic to the LAN, since one or more hosts
on the LAN are members of the group. |
|
In a conventional shared broadcast LAN
using switches that have no multicast smarts, the traffic is flooded to all
hosts. |
|
With multiple high bandwidth multicast
sources (e.g. video at 5 Mbps), this does not scale. |
|
There are a few techniques used to deal
with this... |
IGMP Snooping
|
|
|
|
Implemented by several vendors. Support
for IGMPv2 is common; support for IGMPv3 is becoming more common. |
|
What happens at the MAC layer: |
|
IGMP snoopers add a bridge table entry
for each multicast group destination address (GDA) to each switch port that
has the interested member's unicast source address (USA) already on it. |
|
Remember that there are likely to be
hubs or switches downstream of a given switch port, so more than one USA can
be on a single port. |
|
When an IGMP Leave is received, the GDA
entries are pruned. |
Why IGMP snooping
is
harder than it looks
|
|
|
|
The IGMP membership reports have to be
captured from each host and suppressed to other hosts to prevent the others
from going into idle-member state. Every interested host has to be spoofed
into thinking it is the only member of the group, so that it actively sends
membership reports. |
|
The IGMP snooper then forwards one of
these membership reports up to the router or makes up a fake membership
report coming from one of: |
|
the host |
|
the switch’s management IP address, or |
|
0.0.0.0 |
Why IGMP snooping is
harder than it looks, continued
|
|
|
Since multiple USAs can be on a port
(via downstream switch), the switch has to actually do the IGMP membership
query/timeout before pruning a port. |
|
Since membership reports are sent to
the same GDA as the (possibly high-bandwidth) multicast traffic, there is a
potential for heavy loading of the switch CPU, unless you use more expensive
ASICs that can separate the IGMP protocol messages from general traffic and
route only the IGMP messages to the CPU. |
|
The switch has to know which is the
multicast router port. It does this by snooping for IGMP queries. |
|
|
Join without IGMP snooping
Join with IGMP snooping
Maintaining state w/IGMP
snooping
Leave with IGMP snooping
Leave with IGMP snooping,
cont’d
Sourcing Multicast:
conventional switch
Sourcing with
multicast-aware switch
Design Consequences for
Networks
|
|
|
Be careful selecting/purchasing
switches if you plan to support multicast.
Try to do a test/eval before buying.
Many vendors say they support IGMP, but how well varies widely. Also varies widely within same vendor. |
|
Consider your physical topology
design. Is it possible to put
multicast-heavy subnets closer to the core, or on higher-class switches? Can you avoid switches and connect direct
to a router? |
|
Keep subnets small. Less churn in joins/leaves. |
|
Check defaults. What is turned on and what is not? |
Consequences for
Troubleshooting
|
|
|
In general, multicast on the LAN is not
as well understood as multicast on the WAN. |
|
Bugs are common. |
|
The horsepower of your switch(es) might
matter. When snooping is enabled and CPU load is high, they may drop packets
that shouldn’t be dropped. |
|
Even without snooping, sometimes they
step outside their bailiwick, trying to do non-Layer-2 tasks. |
|
Management visibility into the switch
may be limited. |
|
Often testing to a host directly
connected to a router can expose these problems. |
PIM-SM
|
|
|
Protocol Independent Multicast - Sparse Mode |
|
|
|
The core multicast protocol: builds and
tears down
multicast trees. |
|
“Protocol Independent” means
independent of the protocol used to build the reachability table, not
independent of IP. (More on reachability in a moment.) |
|
“Sparse Mode” refers to the explicit
join approach taken by PIM-SM — the protocol assumes that not everyone wants
the data. |
|
PIM also has a Dense Mode, which starts
with the assumption that everyone does want the data. This is also known as a
flood-and-prune approach. Not recommended! |
Multicast “Routing”
|
|
|
|
Multicast routing can be thought of as
the reverse of unicast forwarding. |
|
Unicast forwarding is concerned with
where the packet is going. |
|
Multicast routing is concerned with
where the packet will be coming from. |
|
Multicast paths to receivers form a
“tree”. The tree is built (or torn down) from the receiver back toward the
source. This is easy to forget, but very important to remember. |
Multicast “Routing”
|
|
|
Multicast forwarding topology is stored
in outgoing interface lists (OILs). |
|
On each router, PIM-SM maintains an OIL
for each group for which it has downstream listeners. |
|
Once the multicast distribution tree is
built, multicast forwarding works similarly to unicast forwarding — but
instead of using unicast forwarding tables to send packets out single
interfaces, routers use OILs to send packets out multiple interfaces. |
|
Multicast packets received from a given
source on an incoming interface for a given group are sent out only on the
interfaces specified in the appropriate outgoing interface list (OIL). |
ASM: the original multicast
service model
|
|
|
Packet transmission is based on UDP, so
packet delivery is
“best-effort”, with no loss detection or retransmission |
|
A source can send multicast packets at
any time, with no need to register or schedule transmissions. |
|
Sources do not know the group
membership. A group may have many sources and many members. |
|
Group members may come and go at will,
with no need to coordinate with a central authority. |
|
And, critically, group members know
only the group. They don’t need to know anything about sources — not even
whether or not any sources exist. |
|
This is the ASM paradigm. It requires sender
registration and tree-switching. |
Multicast Distribution Trees
|
|
|
In the original multicast service
model, a connection between a source and a receiver is first set up by
building an RPT from the receiver back to a Rendezvous Point (RP), then an SPT
(source tree) from the RP back to the source. |
|
Then, once data starts flowing to the
receiver, an SPT is built directly from the receiver back to the source. |
|
This is called “tree-switching”. |
|
A special router adjacent to the
receiver is responsible for this – the PIM Designated Router (DR). |
|
Each multicast-enabled routed segment
on your network has a PIM DR. |
Designated Router (DR)
|
|
|
|
DR sends |
|
“Join/Prune” messages toward the RP
from receiver network |
|
“Register” messages toward the RP from
source network |
|
Selecting the DR: |
|
Neighboring PIM-SM routers multicast
periodic “Hello” messages to each other (default is every 30 seconds; the
hello-interval is tunable for faster convergence). |
|
On receipt of a Hello message, a router
stores the IP address and priority for that neighbor. |
|
The router with highest IP address is
selected as the DR, if the priorities match. |
|
When DR goes down, a new one is
selected by scanning all neighbors on the interface and choosing the one with
the highest IP address. |
|
|
ASM RP Tree Join
ASM Sender Registration
ASM Sender Registration
ASM Sender Registration
ASM SPT Cutover
ASM SPT Cutover
ASM SPT Cutover
ASM SPT Cutover
ASM SPT Cutover
RP Options
|
|
|
|
|
Remember, the RP is used to “hook up”
receivers with senders. Receivers only
know group address. |
|
Static RP |
|
Recommended |
|
Easy transition to Anycast-RP |
|
Allows for a hierarchy of RPs |
|
Auto-RP (Cisco proprietary) |
|
Fixed convergence timers (slow) |
|
Must flood RP mapping traffic |
|
bootstrap router |
|
Fixed convergence timers (slow) |
|
Allows for a hierarchy of RPs |
|
|
RP Options
|
|
|
|
In most cases, static RP is the best
option: |
|
simple: just tell every router the RP
address (once!) |
|
flexible: use a /32 on a loopback
interface so it can be moved |
|
scalable: add more instances of same RP
address for redundancy, load splitting, topological localization, etc. |
|
survivable: fail-over from one RP to
another is as fast as IGP convergence |
|
blessed: RFC 3446 (just 8 pages!) |
|
Only use more complicated options if
you really need to: |
|
different RP(s) for different groups |
|
see later Anycast-RP slides for details |
Inter-domain ASM and MSDP
|
|
|
|
A PIM domain is a network in which all
routers use the same RP for any given multicast group. |
|
|
|
Inter-domain ASM requires another
protocol:
Multicast Source Discovery Protocol (MSDP). |
|
Why? Because the receiver is restricted
to sending only (*,G) joins to its RP.
And its RP doesn’t know where the source is, because the source is
registered to a different RP. MSDP is needed for the receiver's RP to find
the (S,G). |
|
Officially, MSDP is a temporary
solution. We shall see. |
MSDP Peers (inter-domain
case)
|
|
|
|
MSDP establishes a neighbor
relationship between MSDP peers |
|
Peers connect using TCP port 639 |
|
Peers send keepalives every 60 secs
(fixed) |
|
Peer connection reset after 75 seconds
if no MSDP packets or keepalives are received |
|
|
|
MSDP peers must have knowledge of
multicast topology. |
|
Required for peer-RPF checking of the
RP address in the SA to prevent SA looping. Note that this is not the same
thing as the multicast routing RPF check. |
|
|
MSDP Operation — Flooding
|
|
|
|
|
Initial SA message sent when source DR
first registers |
|
May optionally encapsulate first data
packet |
|
Originating RP sends subsequent SA
messages every 60 seconds, for as long as source remains active |
|
Flooding |
|
SA (source active) packets periodically
sent to MSDP peers indicating: |
|
source IP address of active streams |
|
group multicast IP address of active
streams |
|
IP address of RP originating the SA |
|
|
|
RPs only originate SAs for your sources
within your domain! |
MSDP Overview
MSDP Overview
MSDP so far
|
|
|
|
|
Allows RPs to share information about
which sources in their domains are active sending. |
|
Interconnects RPs (MSDP Peers) between
domains, using TCP connections to pass source active messages (SAs). |
|
SAs are Peer-RPF checked before
accepting or forwarding. |
|
RPs may trigger (S,G) Joins on behalf
of local receivers. |
|
MSDP connections typically (but not
always) parallel MBGP connections. |
|
Next: Peer-RPF checking in detail. This
is complex. |
MSDP RPF Rules
|
|
|
The MSDP peer sending the SA is the
originating RP |
|
The MSDP peer sending the SA is the
eBGP next hop for the originating RP |
|
The MSDP peer sending the SA is the
iBGP advertiser for the originating RP |
|
The MSDP peer sending the SA is in the
same AS as the next hop for the originating RP |
|
The MSDP peer sending the SA is
statically configured to be the RPF peer |
Design Issue: Anycast-RP
|
|
|
MSDP used intra-domain to provide RP
redundancy |
|
Becoming best common practice for large
networks |
|
Specified in RFC 3446 |
|
Allows deployment of multiple RPs
within a domain (for the same group range) |
|
Adding more RPs does not require
changes to non-RP routers |
|
Sources and receivers use closest RP,
as determined by the IGP |
|
RPs share information about sources via
MSDP mesh group |
|
Note: MSDP peering uses normal address,
not
Anycast-RP address |
|
|
MSDP Application: Anycast-RP
|
|
|
|
Rules are fairly simple |
|
Have e-MSDP peers and i-MSDP peers,
similar to BGP |
|
If a mesh group member originates a SA
message |
|
Send to all i-MSDP peers and any e-MSDP
peers |
|
If a mesh group member receives a SA
message from an i-MSDP peer |
|
Send to any e-MSDP peers |
|
Do NOT send to other i-MSDP peers |
|
If a mesh group member received a SA
message from an e-MSDP peer |
|
Check RPF — if passes, then |
|
Flood to all i-MSDP peers and any other
e-MSDP peers. |
|
|
|
|
|
|
MBGP Overview
|
|
|
|
|
MBGP: Multiprotocol BGP
(aka multicast BGP in multicast networks) |
|
Makes it possible for multicast routing
policies to differ from unicast routing policies |
|
Can carry different route types for
different purposes |
|
Unicast |
|
Multicast |
|
Both route types carried in same BGP
session |
|
Has nothing to do with multicast state
information! |
|
Same path selection and validation
rules |
|
AS-Path, LocalPref, MED, … |
|
|
MBGP
|
|
|
|
|
|
|
Tag unicast prefixes as multicast
source prefixes for intra-domain mcast routing protocols (PIM, MSDP) to do
RPF checks. |
|
WHY?
Allows for inter-domain RPF checking where unicast and multicast paths
are non-congruent. |
|
DO I REALLY NEED IT? |
|
YES, if: |
|
ISP to ISP peering |
|
Multiple-homed networks |
|
NO, if: |
|
You are single-homed |
|
|
New multiprotocol attributes
|
|
|
|
|
MP_REACH_NLRI and MP_UNREACH_NLRI |
|
Address Family Information (AFI) = 1
(IPv4) |
|
Sub-AFI = 1 (NLRI is used for unicast
forwarding) |
|
Sub-AFI = 2 (NLRI is used for multicast
PIM RPF check and MSDP peer-RPF check) |
|
|
MBGP — Capability
Negotiation
|
|
|
|
|
BGP routers establish BGP sessions
through the OPEN message |
|
OPEN message contains optional parameters |
|
BGP session is terminated if OPEN
parameters are not recognised |
|
New parameter: CAPABILITIES |
|
Multiprotocol extension |
|
Multiple routes for same destination |
|
Configures router to negotiate either
or both NLRI |
|
If neighbor configures both or subset,
common NLRI is used in both directions |
|
If there is no match, notification is
sent and peering doesn’t come up |
|
If neighbor doesn’t include the
capability parameters in open, session backs off and reopens with no
capability parameters |
|
Peering comes up in unicast-only mode |
|
|
MBGP — Summary
|
|
|
|
Solves part of inter-domain problem |
|
Can exchange unicast prefixes for
multicast RPF checks |
|
Uses standard BGP configuration knobs |
|
Permits separate unicast and multicast
topologies
if desired |
|
Still must use PIM to: |
|
Build distribution trees |
|
Actually forward multicast traffic |
End of Protocol
Review.
Questions?
A Methodology for
Troubleshooting
Inter-domain
IP Multicast
Problems Addressed
|
|
|
The main types of problems addressed in
this section are topology/reachability problems – the packets aren’t flowing. |
|
The source and receiver are assumed to
be in two different AS’s.
Troubleshooting multicast within your own campus network is a subset
of interdomain troubleshooting. |
|
Because it is the most common today, we
assume ASM. Many problems would go
away with SSM. |
|
We will mention some things about
performance issues at the end, and list some tools/references. |
Why the need for a
“methodology”?
|
|
|
Most engineers don’t troubleshoot
multicast problems as often as unicast. |
|
As we have learned, multicast is
receiver-driven (somewhat backwards). |
|
The problem can be far from the
symptom. |
|
The same symptom can have many
different causes, at different places in the path. |
Overview
Slide 65
What is the problem?
Gather Information
|
|
|
|
End-users seem to have trouble
reporting multicast problems in our language. |
|
Performance issue vs.
topology/reachability issue? |
|
Was it working recently then stopped
working, or has one site gotten nothing at all from another site? |
|
If nothing, double-check group and port
info, TTL at sender |
|
Is the problem intermittent, cyclic, or
steady-state? |
|
User education about how to report a
problem before a problem happens is very helpful! |
Gather Information
|
|
|
Pick ONE direction (that is the
problem, or seems representative of the problem). |
|
Identify source end and receiving end. |
|
Recall multicast is unidirectional in
nature… |
Gather Information
|
|
|
A constantly active source IP address |
|
A constantly active receiver IP address |
|
The group address |
Gather Information
Slide 71
Verify Receiver Interest
|
|
|
Because of the way multicast
distribution trees are built, it is almost always easier to debug a problem
by starting at the receiver. If you
are the sender, you are pretty much working blind. |
|
Recall in ASM, group interest on a
subnet is indicated by a host sending out (multicast) an IGMPv2 membership
report. |
|
The DR (designated router) on a segment
is responsible for listening to that report, and forwarding a PIM ( *
, G) join towards the RP. |
|
For this step, all we need to do is
verify which router is the DR, and check that it knows it has interested
listeners for that group on the interface facing the receiver. Stop there.
Don't worry about getting to the RP at this point. |
Verify Receiver Interest
|
|
|
|
What can go wrong? |
|
No host is sending out IGMP membership
reports, or not the right version. |
|
A switch is in the path that is
dropping/limiting multicast/IGMP. |
|
The router is not running IGMP, PIM,
etc. |
|
A device has been elected DR that
shouldn't have been. |
|
bugs, incompatible timer
implementations, querier confusion, etc. |
|
ACLs, firewalls. |
Verify Receiver Interest
|
|
|
You might think you know which router
is the DR, but you should not proceed until it has been verified. It only takes a couple seconds. |
|
To verify the DR, log into the router
you think should be routing multicast
for the receiver. |
|
1) Find/verify the interface that
serves the receiver’s subnet. |
|
2) Check that there is no other PIM
router that thinks it is the DR for the subnet. |
|
Although in our workshop lab our
first-hop routers are Ciscos, the following examples show both Junipers and
Ciscos. |
|
|
Verify Receiver Interest
Verify Receiver Interest
Verify Receiver Interest
Verify Receiver Interest
Verify Receiver Interest
|
|
|
SO… now you are sure you are on your
receiver’s DR. |
|
Remember, multicast is receiver-driven. |
|
QUESTION: Does the DR know that there are interested
receivers of the group on your host’s subnet?? |
|
Look at IGMP for the group in question. |
Verify Receiver Interest
Verify Receiver Interest
Verify Receiver Interest
|
|
|
|
What if your interface isn’t listed
with that group, even though everything else about the DR looked fine?? |
|
You have a problem! |
|
Host OS / driver problem |
|
Application problem |
|
Broken IGMP snooping switches in the
middle |
|
Try tcpdump on the host - can you see
the IGMP membership reports on the wire?
(Remember, they don't have to come from that particular host.) |
|
|
Verify Receiver Interest
|
|
|
If your receiver’s DR knows it has
listeners of your group on that interface, you are done this step. |
Slide 84
Verify knowledge of active
source
|
|
|
|
This is often the most complex part –
the bulk of your work could be here.
As we have learned, a lot has to happen for the receiver’s DR to know
about a particular source. |
|
You MAY have to view this from both
ends |
|
The receiver’s RP |
|
The source’s RP |
|
For most interdomain cases, these RPs
will not be the same, and MSDP will be involved. |
Verify knowledge of active
source
|
|
|
First, let’s check to see if this is a
problem at all. |
|
If the receiver’s DR has (S,G) state
already, we know we are ok on knowledge of active source, and we can skip
this whole step! |
Verify knowledge of active
source
Verify knowledge of active
source
Verify knowledge of active
source
|
|
|
If the DR does NOT know about the
source, we may only see a ( * , G) entry on a Cisco DR, and we
have some work to do. |
|
|
Verify knowledge of active
source
|
|
|
If the DR does NOT know about the
source, we may see nothing on a Juniper DR, and we have some work to do. |
Verify knowledge of active
source
|
|
|
Recall that knowledge of active sources
is first spread through a given PIM domain by per-group RP-rooted shared
distribution trees. |
|
Current practice is to set the Shortest
Path Tree (SPT) threshold to zero, so that (S,G) state is created on the
first packet sent through the RP. |
|
But if the RPT doesn’t get built
properly, the SPT never will! |
Verify knowledge of active
source
|
|
|
So, first, we will work back from the
receiver’s DR to its RP, to be sure that the RPT branch is built correctly. |
|
Second, we will check to see if the
receiver’s RP knows about the source. |
|
Third, we will check with the source
end for their RP’s knowledge and advertisement of the source. |
|
Last, we will troubleshoot MSDP as
needed to make sure knowledge of the source can get from one RP to the other. |
|
The following page has a rough
flowchart for later reference. |
Verify knowledge of active
source
Verify knowledge of active
source
|
|
|
First, we check that the RPT is built
properly from the receiver’s DR back to the receiver’s RP. |
Verify knowledge of active
source
|
|
|
|
Does the DR have the right RP (Cisco)? |
|
We can first just look at the ( * , G)
entry on the receiver's DR. |
|
If that doesn't look right, we can look
to see how it learned about the RP with
show ip pim rp mapping <group> . |
Verify knowledge of active
source
|
|
|
Does the DR have the right RP
(Juniper)? |
Verify knowledge of active
source
|
|
|
|
What if the RP is wrong? |
|
A common problem is that auto-RP and/or
PIMv2 BSR may be running without the admin's knowledge (on Ciscos they are on
by default when PIM-SM is enabled, and Junipers listen to them). Information can leak from a neighboring
AS! These take precedence over
anything you statically configure. |
|
Hint: use ip pim rp-address <address> override |
|
Auto-RP and BSR are complex, and could
have any one of a number of problems.
We recommend static configuration in most campus networks, Anycast-RP
in backbone/transit networks. |
|
Might just be a typo in entering the
static RP address. |
Verify knowledge of active
source
|
|
|
Now that you are sure of what the RP is
(and it is correct), starting at the receiver’s DR, work your way back to the
receiver’s RP: |
|
Check that the RPF is pointing the way
you expect. |
|
Check that PIM is configured and
working properly on the interface. A
common problem is PIM is not turned on for a particular interface. |
|
You may also want to double-check that
each router has ( * , G) state for the group you are debugging. |
Verify knowledge of active
source
|
|
|
|
show ip rpf <RP ip address> |
|
show ip pim neighbor <rpf
interface> |
Verify knowledge of active
source
|
|
|
|
show multicast rpf <RP ip
address> |
|
show pim neighbors |
Verify knowledge of active
source
|
|
|
Repeat that process until you have
verified the RPF paths and the PIM adjacencies back to the receiver's
RP. This verifies that the RPT has
been built correctly. |
Verify knowledge of active
source
|
|
|
Next Big Question: Does the receiver's RP have knowledge of
the active source? |
|
Since we already checked that the RPT
is correct, it probably doesn’t, or the DR would have likely had (S,G)
information. |
|
If it doesn’t, but has ( * ,
G) only, and no MSDP SA (source-active) cache entry for that source, we will
have to find out some information about the source end of things, then
troubleshoot MSDP. |
|
Note it does not matter which peer you
get an SA from as long as it is accepted and in the cache. However, if you are going to open a ticket
with an upstream, you might as well figure out who you expect to accept it
from. |
|
|
Verify knowledge of active
source
|
|
|
The objective here will be to get an
MSDP source-active about the source to our receiver’s RP. |
|
The SA originates from the source’s RP,
and is re-advertised/ flooded by MSDP peers along the way. |
|
Some sites have estimated that about
half of their multicast problems are problems associated with missing MSDP SA
information. |
Verify knowledge of active
source
Verify knowledge of active
source
|
|
|
Recall it is MSDP's job to flood
source-active advertisements between peers so that an RP in one PIM domain
can know about active sources in another. |
|
MSDP SA advertisements are
accepted/forwarded or rejected based on MSDP "peer-RPF" rules
covered earlier in this workshop. |
|
Remember, the information being tested
against the peer-RPF rules is the originating RP's IP address. Not the IP of the source itself, but its
RP. |
|
We need to trace the source-RP via the
peer-RPF rules from our receiver's RP out into our neighbor's AS. |
Verify knowledge of active
source
|
|
|
|
But… how do we know the source’s RP if
we run only the receiver network? |
|
You may have to pick up phone and walk
them through verifying the source’s DR and finding the group-to-RP mapping
there. |
|
Get them to tell you they have verified
the source is sending, the group, port number, source TTL setting and the IP
of their RP is ___. |
|
You might want to have them look to see
that they mark the mroute as a candidate for MSDP advertisement while you're
there. (Example - next slide.) |
Verify knowledge of active
source
Verify knowledge of active
source
|
|
|
|
Now we have the source/originating RP's
IP address. |
|
The idea here is we are trying to
figure out which of our MSDP peers we should expect to get knowledge of the
actual source from. |
|
If the source RP is an MSDP peer of our
RP, the source RP is the RPF peer. |
|
If we look at show ip mbgp <source RP IP> , the
MSDP peer in the adjacent AS is the RPF peer. |
|
In practice, in most campus networks, show ip rpf <source RP IP> and show
ip mbgp <source RP IP> will
usually get you going in the right direction. |
Verify knowledge of active
source
Verify knowledge of active
source
|
|
|
Assuming we do not have an entry for
the source and group in our receiver RP's SA-cache, we might be able to see
if we are getting a reasonable SA advertisement but rejecting it: |
Verify knowledge of active
source
|
|
|
|
If we are getting an SA from what we
think should be the RPF peer, yet rejecting it, we need to work through the
MSDP peer-RPF rules to figure out why.
Possible reasons: |
|
We've configured to use only the
multicast RIB, yet we have no MBGP route to the originating RP. Check that the source network is
advertising the route to the RP in MBGP and we are accepting it (policy
misconfigurations). |
|
We have MBGP running, but not MSDP,
with a peer that appears to have a better route to the originating RP than
who we think is the RPF peer. |
|
incorrectly configured default peer. |
|
bugs, voodoo, who knows! |
Verify knowledge of active
source
|
|
|
|
Assuming you are not getting an SA from
the peer you think should be the RPF peer, you may need to open a ticket with
your upstream provider or peer. You
can give them the following: |
|
We are not getting an SA for <source IP address> |
|
The group address is <group
address> |
|
The source’s RP is <source RP IP address> |
|
We expected to get this from
<MSDP peer’s IP address> |
|
Also report if you’re not getting the
MBGP route. |
Verify knowledge of active
source
|
|
|
Other than just turning the problem
over to your upstream provider, for many Internet2 campuses, Abilene core
routers will be in the path. |
|
It is sometimes helpful to go to the
router proxy closest to the source and check for the SA-cache entry for the
source/group in question there. |
|
If there is no entry there, it is not
too surprising your campus is not getting a valid SA. (We have a screenshot
at the end of these slides.) |
Verify knowledge of active
source
|
|
|
Since you have already checked your
path back from the receiver to your RP, you should then get (S,G) state on
the receiver’s DR when you fix rejecting a received SA, or your upstream
provider or peer resolves the ticket concerning a missing SA. |
Slide 115
Slide 116
Trace forwarding state back
|
|
|
We now have (S,G) state on the
receiver’s DR. |
|
Next, we need to check to see if
traffic is actually flowing…
(Cisco example) |
Trace forwarding state back
|
|
|
Here’s how to check if traffic is
flowing on a Juniper: |
Trace forwarding state back
|
|
|
Start on your receiver’s DR. |
|
This time, RPF back towards the actual
source IP address (as opposed to the source RP). |
Trace forwarding state back
Trace forwarding state back
|
|
|
Work your way back towards the source
IP, looking for PIM problems along the way. |
Trace forwarding state back
Trace forwarding state back
Trace forwarding state back
Trace forwarding state back
Trace forwarding state back
Trace forwarding state back
Slide 128
Slide 129
Slide 130
Slide 131
Slide 132
Slide 133
Tools
|
|
|
|
Beacon http://dast.nlanr.net/projects/Beacon/ |
|
The beacon is an application to
monitor multicast reachability and
performance among beacon-group participants.
Participants both send and receive on a known group. |
|
The results are displayed with
receivers on the hosts as the vertical axis and sources on the horizontal
axis. |
|
A host’s source number matches its
receiver number. |
Tools
|
|
|
http://dast.nlanr.net/projects/Beacon/ |
Tools
Tools
Tools
Tools
Tools
Tools
|
|
|
|
rtpqual
ftp://ftp.ee.lbl.gov/rtpqual.c |
|
Simple Multiprotocol Multicast Signal
Quality Meter |
|
very useful for establishing a receiver
(even if the multicast is not using RTP) |
|
also useful for finding packet loss
problems and whether they are periodic or not |
|
If you know the group but not the port,
you can use rtpqual to join with any port, then use tcpdump to find out which
port the traffic is actually going to. |
|
Mtrace
ftp://ftp.parc.xerox.com/pub/net-research/ipmulti/mtrace5.2.tar.gz |
|
Simple host-based rpf check tool |
|
Iperf
http://dast.nlanr.net/projects/iperf |
|
Source/client traffic generator that
can generate multicast packets (requires access to device at both ends of
path) |
Slide 142
Slide 143
Questions?
Thank you!
Caren Litvanyi litvanyi@grnoc.iu.edu