by Patrick Kerrigan, . Tags: Linux Networking
I run a piece of software which is replicated across two machines and shares information via multicast. I noticed after performing some updates that information only seemed to be flowing in one direction, so thought I'd share the details of what ended up being wrong for the next person to run into the same problem.
What's multicast?
Chances are that if you've found this post you probably already know what multicast is, but for those who don't, it's a mechanism for sending packets to multiple hosts on an IP network at once. Unlike broadcast, hosts have to subscribe to the multicast traffic that they wish to receive and switches and routers facilitate delivering the traffic to only those hosts which have subscribed.
The problem
Let's call the two machines running the multicast aware application "A" and "B". Both send data to the same multicast group in order for the other host(s) running the application to keep updated on some state.
A is receiving data from B, but B only receives data from A for about 5 minutes after start-up, then nothing more. A check with tcpdump shows that A is definitely sending data to the multicast group, so something is broken, and it isn't the application.
Troubleshooting
The first thing to check is that there's a multicast querier on the network. A querier uses a protocol called IGMP to ask hosts interested in multicast traffic to report to it which traffic they're subscribed to. Without a querier running the hosts won't renew their multicast subscriptions, which would cause behaviour similar to what I'm seeing here. In my case, my router has this functionality enabled, and I can see the queries being sent on the network.
Next up is switches. The switch that these two hosts are plugged into supports multicast and has IGMP snooping enabled. IGMP snooping allows the switch to "listen in" on the IGMP queries passing through it to learn which ports multicast traffic needs to be sent to. Without it the multicast traffic will be broadcast to all hosts on the network, wasting resources. All is good here, although support for IGMP snooping suggests that the switch may not be seeing host B renew its multicast subscription and so cuts off traffic.
With the above in mind, let's take a look at the IGMP traffic on each host:
# tcpdump igmp
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
23:25:25.199567 IP gateway.local > all-systems.mcast.net: igmp query v3
23:25:31.309534 IP A.local > igmp.mcast.net: igmp v3 report, 1 group record(s)
# tcpdump igmp
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
23:25:25.199520 IP gateway.local > all-systems.mcast.net: igmp query v3
Here we can see that both hosts receive an IGMP query from the router, but only A responds. That would cause the switch to conclude that B is no longer interested in traffic to the multicast group and cut it off.
The solution
What would prevent host B replying to the query? A firewall. It turns out that when I tightened up the firewall rules on these machines I'd made sure to open the ports used for the multicast traffic, but for host B I'd missed IGMP! A quick nftables rule to allow IGMP traffic lets the host receive and respond to IGMP queries, solving the problem:
ip protocol igmp accept