Beware of mDNS Floods from Buggy Android Clients

android_sickRecently, I discovered a large increase in multicast traffic on an enterprise Cisco WLAN. This increase was large enough to cause packet loss in several areas where bandwidth is limited, usually at the WAN edge. While throughput remained within the acceptable range for a circuit, an extremely high packet rate was overwhelming the edge device’s capacity to process packets. So despite bandwidth utilization being normal, packet loss was still occuring.

The primary symptom of this problem was poor voice call quality on calls across the WAN, at seemingly random times, including periods where bandwidth utilization was very low.

A close inspection of one particular site uncovered abnormally high packet rates, measured in pps (packets per second). We used a span port to run a packet capture of the traffic, with the hope of isolating a source of the packet flood. Here’s what we found:

mdns-flood-protocols
Wireshark’s Protocol Hierarchy Window

37% of the packets on this busy WAN circuit were mDNS queries! More specifically, the sources were smartphones on the guest WLAN that were sending mDNS queries for Chromecasts, Apple TV’s, and other services advertised with mDNS. In a centrally-switched WLAN, all of that traffic is backhauled across the WAN to the WLC which services it. These queries were leaving remote sites in a CAPWAP tunnel, traversing the WAN to the WLC, which then forwarded them on to all of it’s AP’s at all other sites, once again traversing the WAN.

Once we had this data, it became any easy problem to fix. There was no business requirement for multicast on the guest WLAN, so we blocked it at the WLC using a layer 3 ACL. This ACL blocks all multicast, but we added a separate line for mDNS to get an idea of the volume of mDNS packets compared to other multicast traffic on the WLAN.

mdns-flood-acl
ACL to block all multicast on the WLAN

You can see from the hit counters that this guest WLAN is seeing a lot of mDNS traffic, which dwarfs any other multicast traffic. With this ACL in place, the problem was resolved. mDNS uses multicast address 224.0.0.251 and UDP port 5353, if you want to block it more precisely.

Looking back at our pre-ACL pcap, I observed several clients on the guest WLAN flooding the network with mDNS queries, which the WLC then forwarded back across the WAN to its AP’s. The top five worst offenders in the pcap were all Android phones, with OUI’s from Samsung, HTC, and LG, and they were each responsible dumping thousands of mDNS packets on the network in a matter of seconds.

mdns-flood-client-pcap
An unfiltered pcap from the WAN edge should not look like this!

In a network with several thousand guest devices, most of which are smartphones, if enough of these buggy Android phones are dumping mDNS queries like this, it could result in serious problems if it is not controlled. The packet rate at the WAN edge may cause issues with other traffic if it results in output drops (such as voice which we observed), and the increase in multicast traffic on the RF may be an issue as well if the WLAN handles multicast traffic normally, which means transmitting it at the lowest mandatory data rate. That will increase airtime utilization substantially, leaving less airtime for other clients to transmit and receive data.

Until recently, I did not understand what triggered this problem to occur. Now I do. There appears to be a bug in recent versions of Android that results in these mDNS floods when the phones leave sleep mode, resulting in thousands of frames being transmitted in the seconds after the client wakes up. TP-Link discovered the same behavior we observed and traced it to “recent releases of Android OS and Google Apps.”

For WLAN administrators, I suggest taking a few minutes to think through the implications of this issue for your network. As I discovered in my troubleshooting, it is very likely to affect more than just the guest WLAN.

PS: For those of you reading this that aren’t network engineers and think your wireless router is junk, it isn’t. And please stop jumping to the conclusion that the infrastructure is to blame for Wi-Fi problems! 80% of Wi-Fi problems are client device issues, not infrastructure issues.

Update: Google has acknowledged this issue and is releasing a patch via Google Play on January 18th.

Advertisements