Roaming Analysis using only a Mac and Wireshark

There are many ways to examine the roaming performance of a Wi-Fi client. Perhaps the gold standard is to follow the client with a laptop running Omnipeek and several Wi-Fi adapters all capturing frames on different channels. I’m also impressed with 7signal’s recent update to Mobile Eye which now logs roaming data as well. But what if you don’t have that, or want to do something quickly with a Mac without switching to Windows and hooking up your Wi-Fi adapter array?

Using a Mac laptop to capture frames on a single channel with Airtool, you can still get valuable information about the roaming performance of a Wi-Fi client with a few Wireshark display filters and some I/O Graphs magic.

The process is simple. Discover the channel the test client is using, and start an over-the-air capture on that channel. Take you Mac and the test client and move out of the current AP’s cell so the client roams away, then come back so that the client roams back. Repeat as necessary until you have captured both a roam-away and a roam-back.

Let’s roam

Now it’s time to look at the captured frames. First, let’s build a display filter to only show the frames to/from the test client, as well as all of the AP’s beacon frames. We’re including the AP’s beacon frames so that we can see the changes in RSSI as the client moved away from then back towards the AP.

wlan.addr == aa:bb:cc:dd:ee:ff || ( wlan.ta contains 11:22:33:44:55 && wlan.fc.type_subtype == 8)

aa:bb:cc:dd:ee:ff is the MAC of the test client. 11:22:33:44:55 is the first five octects of the AP’s BSSID. By matching on the first five octets of the AP’s BSSID rather than the exact BSSID, we preserve the beacon frames from all of the AP’s BSSID’s, which will gives us more data to calculate the RSSI of the AP.

Once applied, export the displayed packets only to a new file that we’ll generate the graphs from. Open the new file and now we can configure the I/O Graphs. These are some of the display filters I use:

The roam-away is on the left, and the roam-back is on the right.

AVG Tx Data Rate needs to be set with the test client MAC address, and AP RSSI needs to have the first five octets of the AP’s BSSID.

By zooming into the beginning of the graph, we can observe the client’s data frames, retries, Tx data rate, and the RSSI at which it roamed away. A benefit of dBm being measured in values less than zero is that it is separated from the rest of the data on the graph, so we have layer 1 data below 0, and layer 2 data above.


This Cisco 7925G phone roams away before the AP’s RSSI drops to 70 dBm, and before retries start to increase. We see similar good behavior when it roams back below.


Let’s take a look at a Wi-Fi client that roams poorly. Here’s a client-that-shall-remain-nameless roaming away from an AP. You can see retries spiking and its data rate plummeting well before it roams away. The AP’s RSSI drops into the -80’s for most of a minute before it decides to roam!

This graph includes the test client’s average Tx data rate.

Of course, this approach has some limitations. You must know that a client like the one above was in range of a louder AP operating on a channel it supports when it started having trouble before you decide it’s a sticky client, otherwise it’s doing exactly what it should be doing–trying to maintain the only association it can.

You know when the client decided to roam, but you don’t know how long it took.

As you move away from the AP, you might see the AP’s RSSI spike to 0. That happens when your laptop’s adapter is unable to demodulate beacon frames from the AP due to poor SNR.

Also, the AP RSSI is measured by a Mac laptop that is following the test client. Unless the test client is the same model of Mac laptop, it will probably hear the AP differently, most likely with less sensitivity. My MacBook Pro is a 3×3:3 client, and the two test clients I looked at for this blog are both 1×1, so it’s reasonable to assume the Mac benefits from a significant increase in RSSI from MRC. Taking that into consideration, the poor roaming from the client-that-shall-remain-nameless is probably even worse than it looks.


Beware of mDNS Floods from Buggy Android Clients

android_sickRecently, I discovered a large increase in multicast traffic on an enterprise Cisco WLAN. This increase was large enough to cause packet loss in several areas where bandwidth is limited, usually at the WAN edge. While throughput remained within the acceptable range for a circuit, an extremely high packet rate was overwhelming the edge device’s capacity to process packets. So despite bandwidth utilization being normal, packet loss was still occuring.

The primary symptom of this problem was poor voice call quality on calls across the WAN, at seemingly random times, including periods where bandwidth utilization was very low.

A close inspection of one particular site uncovered abnormally high packet rates, measured in pps (packets per second). We used a span port to run a packet capture of the traffic, with the hope of isolating a source of the packet flood. Here’s what we found:

Wireshark’s Protocol Hierarchy Window

37% of the packets on this busy WAN circuit were mDNS queries! More specifically, the sources were smartphones on the guest WLAN that were sending mDNS queries for Chromecasts, Apple TV’s, and other services advertised with mDNS. In a centrally-switched WLAN, all of that traffic is backhauled across the WAN to the WLC which services it. These queries were leaving remote sites in a CAPWAP tunnel, traversing the WAN to the WLC, which then forwarded them on to all of it’s AP’s at all other sites, once again traversing the WAN.

Once we had this data, it became any easy problem to fix. There was no business requirement for multicast on the guest WLAN, so we blocked it at the WLC using a layer 3 ACL. This ACL blocks all multicast, but we added a separate line for mDNS to get an idea of the volume of mDNS packets compared to other multicast traffic on the WLAN.

ACL to block all multicast on the WLAN

You can see from the hit counters that this guest WLAN is seeing a lot of mDNS traffic, which dwarfs any other multicast traffic. With this ACL in place, the problem was resolved. mDNS uses multicast address and UDP port 5353, if you want to block it more precisely.

Looking back at our pre-ACL pcap, I observed several clients on the guest WLAN flooding the network with mDNS queries, which the WLC then forwarded back across the WAN to its AP’s. The top five worst offenders in the pcap were all Android phones, with OUI’s from Samsung, HTC, and LG, and they were each responsible dumping thousands of mDNS packets on the network in a matter of seconds.

An unfiltered pcap from the WAN edge should not look like this!

In a network with several thousand guest devices, most of which are smartphones, if enough of these buggy Android phones are dumping mDNS queries like this, it could result in serious problems if it is not controlled. The packet rate at the WAN edge may cause issues with other traffic if it results in output drops (such as voice which we observed), and the increase in multicast traffic on the RF may be an issue as well if the WLAN handles multicast traffic normally, which means transmitting it at the lowest mandatory data rate. That will increase airtime utilization substantially, leaving less airtime for other clients to transmit and receive data.

Until recently, I did not understand what triggered this problem to occur. Now I do. There appears to be a bug in recent versions of Android that results in these mDNS floods when the phones leave sleep mode, resulting in thousands of frames being transmitted in the seconds after the client wakes up. TP-Link discovered the same behavior we observed and traced it to “recent releases of Android OS and Google Apps.”

For WLAN administrators, I suggest taking a few minutes to think through the implications of this issue for your network. As I discovered in my troubleshooting, it is very likely to affect more than just the guest WLAN.

PS: For those of you reading this that aren’t network engineers and think your wireless router is junk, it isn’t. And please stop jumping to the conclusion that the infrastructure is to blame for Wi-Fi problems! 80% of Wi-Fi problems are client device issues, not infrastructure issues.

Update: Google has acknowledged this issue and is releasing a patch via Google Play on January 18th.

Mitigating the KRACK in WPA2 with WIPS

On Monday, security researcher Mathy Vanhoef disclosed a new vulnerability in the WPA/WPA2 four-way handshake, which has been branded KRACK. The attack is targeted and sophisticated, and it results in decrypting a TKIP or CCMP/AES encrypted session without knowledge of the PTK. WPA/WPA2-Personal and WPA/WPA2-Enterprise networks are vulnerable.

The attack takes advantage of client side implementations of the WPA/WPA2 protocol, which in some cases allows clients to reinstall the PTK and reuse cryptographic information in a way that allows the the attacker to decrypt the session. The PSK or 802.1X credentials are not compromised by this attack. I know that description is vague so if you want more, my favorite resource on this is this serious of videos from Hemant Chaska of Mojo Networks. Do yourself a favor and watch them all.

The ultimate solution to the vulnerability is to patch clients to prevent them from reusing the same cryptographic information when EAPOL keys are retransmitted. That will take some time, and there are a lot of clients, like IoT clients, which are unlikely to ever be patched. Windows and iOS clients with the latest security patches are already protected.

Fortunately, the attack relies on the attacker deploying an easy to detect and mitigate rogue AP. Today, without patching clients or the WLAN infrastructure, KRACK can be totally mitigated on a WLAN by configuring WIPS to auto-contain rogue AP’s that broadcast one of your own SSID’s. You need to tread lightly and understand the legal consequences before enabling auto-containment of rogue AP’s (Configure it for alerting-only first!). It’s best to get management and your InfoSec teams involved before taking this step so that the benefits and risks of auto-containment are understood by the organization.

Another solution is to disable retransmission of EAPOL frame M3 on the WLAN, but sometimes M3 needs to be retransmitted. If there was a collision or the frame arrived to quickly for the client to process, it should be retransmitted to complete the four-way handshake and prevent the client from going through a full reassociation. This is especially true for latency-sensitive voice clients which roam frequently, resulting in many four-way handshakes. These clients may be short on CPU cycles and free memory to quickly process EAPOL frames, and may require an occassional EAPOL frame retransmission.

Therefore, I prefer to mitigate KRACK by using WIPS to contain rogues that use the organization’s own SSID’s. As you can see from the test below, a Cisco monitor-mode AP will deauth a new client on a rogue AP before any data frames are transmitted.

In my testing, every client that associated to the rogue AP was deauthed before any data frames could be transmitted by the client.

Even simple probing on the channel resulted in a flood of deauth/disassociate frames from the monitor-mode AP to the client:


This attack works by setting up the rogue AP on a different channel from the target AP, so make sure you are scanning all channels for rogues. It’s also a good idea to setup notifications from your NMS in the event that a rogue is contained so that you are aware of potential attacks as well as false positives that require correction.

The one exception to WIPS protection appears to be CVE-2017-13082, which will require an infrastructure-side patch. This only affects SSID’s that use 802.11r.

So patch your clients, tune your WIPS, and relax! The sky is not falling.

macOS Wi-Fi Roaming

One of the nice things about Intel wireless chipsets is that the drivers expose a lot of controls to help tune the chipset’s operation. One of my favorite of these controls is “Prefered Band,” which I usually adjust to instruct the chipset to prefer the 5 GHz band over the 2.4 GHz band. There are some other useful controls like “Roaming Aggressiveness” and you can also enable Fat Channel Intolerance if a neighbor is rudely using 40 MHz of spectrum in 2.4 GHz.


Although macOS has many advantages over Windows when it comes to Wi-Fi, such as the ability to natively do packet captures with the internal chipset, macOS doesn’t have the same level of customization as a Windows machine with an Intel chipset. And my experience has been that Mac clients don’t roam particularly well. Too often they are “sticky clients” and you need to disable/enable Wi-Fi on them to get them to associate with a better BSS.

Here’s a screenshot for a MacBook Air which wouldn’t roam away from a BSS whose RSSI has fallen to -80 dBm, while the laptop was only able to transmit at MCS 0, 7 Mbps. However there was another BSS in the -60’s which would have allowed for much better Wi-Fi performance.

Why is the native macOS Wi-Fi menu showing a full signal with -80 dBm RSSI and MCS 0? Wi-Fi Signal tells the real story.

In 2016 Apple published a webpage that explains how macOS makes roaming decisions and what roaming features it supports. This is very helpful and I wish other manufacturers would do the same. The algorithms that control client roaming are usually a black box, so Wi-Fi engineers have make a lot of assumptions about them when designing WLAN’s for clients that require efficient roaming. That said, while Apple says Macs should usually roam at -75 dBm, that doesn’t match my experience. Sometimes Macs are just sticky.

One reason for this is that once the roaming threshold is crossed, a Mac will only roam to a BSS that is 12 dBm louder than the current BSS, which would require a roaming candidate BSS to have an RSSI of -63 dBm or better before roaming will occur at -75 dBm. There doesn’t appear to be any way to modify this value.

Enabling 802.11k or 802.11v won’t help because macOS does not yet support those features, although they don’t prevent Macs from using an SSID that has them enabled. 802.11k and .11v are supported in Windows 10, however, if the wireless adapter supports those features.

There is an old plist that once controlled “opportunistic” roaming behavior, which I suspect meant roaming above -75 dBm RSSI.


…which has these defaults in macOS 10.12 Sierra:

    deltaRSSI = 10;
    disabled = 0;
    useBonjour = 0;
    useBroadcastBSSID = 1;

That looks promising, however, this plist hasn’t been used by macOS since macOS 10.10 Yosemite. It’s ignored by the OS now, and when it was utilized, it wasn’t intended to be user-editable, so changes were likely to be overwritten by the OS.

So if you are an enterprise with a fleet of Macs to manage and you run into sticky client issues, consider infrastructure features like Cisco’s Optimized Roaming or Aruba ClientMatch to force better roaming behavior among these clients.

To observe roaming behavior on a Mac, I recommend WiFi Signal from Adrian Granados. It can be setup to generate macOS notifications when roaming events occur or the RSSI of the AP drops below a certain threshold.


Splunking Wi-Fi DFS Events


One aspect of wireless networking that I’ve always struggled with is visibility into DFS events. Usually I catch them by chance by noticing two nearby AP’s on a site map using the same non-DFS channel, or maybe by casually looking through logs, but I’ve never felt like I had the reporting and alerting that should be in place for DFS events, because they can be very disruptive. An AP will abruptly change the channel it is operating on, and if it switches back, it may observe a “quiet period” of 60 seconds in which is does not transmit any data. Not good.

Enter Splunk.

Splunk is a powerful log analysis tool that you can think of as “Google for the data center.” It takes log data from almost any source and makes it as searchable as Google has made the web. For wireless network engineers, you can quickly and easily search syslog and SNMP data, build reports, and create alerts. Splunk Light is free and will process up to 5 GB of data a day, which should be plenty for most WLAN’s. It also runs easily on macOS if you just want to demo it locally.

Using Splunk I very quickly created this dashboard of real DFS data from SNMP traps coming from a Cisco WLC. It’s a little rough around the edges still (I need to figure out how to clean-up those AP names and channels), but it still shows me a lot of the valuable data.

Yes, DFS is a problem at this site.

I can easily create email alerts too, so that if a DFS event occurs an email is triggered, or if say 10 DFS events occur within 30 minutes an email is triggered.

How To

I installed Splunk on a Mac then setup the built-in snmptrapd to listen for incoming traps and log them to a file. For snmptrapd to interpret the SNMP traps from a Cisco WLC, download the Cisco MIB’s and copy them to /usr/share/snmp/mibs/. Then you can start snmptrapd.

Here’s the CLI one-liner to do that:

sudo snmptrapd -Lf /var/log/snmp-traps --disableAuthorization=yes -m +ALL

Next configure the WLC to send SNMP traps to the Splunk box by adding its IP address under Management -> SNMP -> Trap Receivers. While you’re there go to Trap Controls and turn everything on you want to analyze.


Even though DFS events only generate SNMP traps, it’s still a good idea to send syslog messages to Splunk too, so do that under Management -> Logs -> Config. Set the Syslog Level to “Informational” to get a lot of good data. “Debugging” is probably way too much. The Syslog Facility isn’t important.


Monitor the file snmptrapd is writing traps to to make sure it is working. Run this command on the Mac and you should see traps streaming in. If not you have some troubleshooting to do.

tail -f /var/log/snmp-traps

Now add the file to Splunk under Data -> Data inputs -> Files & directories, and you should be able to see the traps in searches.

Have a look at Splunk’s documentation on SNMP data for more setup help. Setting up syslog is easier. Under Data -> Data inputs -> UDP add UDP port 514 with the Source type “syslog.”

Once the data is coming into Splunk you can start searching it and creating fields. Search “RADIO_RC_DFS” (with quotes) to see all the DFS traps. From that search click “Extract new fields” and select the tab delimiter to parse the data. Give the AP name field a label, and then you can create visualizations of DFS events by AP name. Any search can also be used to trigger an alert, such as an email.

Cisco has published a WLC SNMP Trap Guide as well as a WLC syslog Guide that is helpful when working with this data. Find the messages you are looking for in those guides, then search for them in Splunk.

From there it’s all up to your own creativity. DFS events is just scratching the surface of Splunk’s potential. You can look at authentication events, monitor RRM, and there might be some interesting roaming analysis that can be done with this data as well. I’m sure there are some bright engineers out there that have taken this a lot farther. Please share your work!

Use Let’s Encrypt Certificates with FreeRADIUS


Let’s Encrypt is a certificate authority that generates TLS certificates automatically, and for free. It’s been great for web server administrators because it allows them to automate the process of requesting, receiving, installing, and renewing TLS certificates, taking the administrative overhead out of setting up a secure website. And did I mention it’s free and supported by all the major web browsers now?

Getting all of that to work with a RADIUS server is challenging however, mostly because of the way Let’s Encrypt works. The Let’s Encrypt client runs on a web server with a public domain name. The client requests a TLS cert from Let’s Encrypt and before Let’s Encrypt issues the cert, it verifies that the client is connecting from the same domain name that it is requesting a cert for, and that the client can put some hidden files on the server’s website. Do you see the problem? Unless you run a public-facing web server on your RADIUS server (unlikely), Let’s Encrypt will not issue certs to your server. It needs a web server it can interact with in order to validate the domain name of the client’s request.

Why use a certificate from a public CA like Let’s Encrypt for 802.1X/PEAP authentication? While a private CA offers more security, a public CA has the advantage of having a pre-installed root certificate on virtually all RADIUS supplicants, including BYOD clients that are unmanaged. If you don’t have an MDM or BYOD onboarding solution, you can’t get your private root cert onto BYOD clients very easily.

Unmanaged clients are a security risk, however, because the end-user can easily override security warnings that occur when connecting to an evil twin network with a bogus cert. A good MDM solution will allow network admins configure BYOD clients properly so that TLS failures cannot be bypassed.

A few considerations before you get too excited:

  • Again, a better, more secure solution is to use a private CA and distribute the RADIUS server cert to clients using an MDM solution and/or BYOD onboarding solution.
  • Let’s Encrypt certs are only good for three months at a time, and some supplicants will prompt users to accept the new certificate when it is renewed.
  • Build in some error handling, logging, and notification. E.g. an email from the web server when the cert renewal routine runs, including its output, and an email from the RADIUS server when it copies the new certs and reloads FreeRADIUS.
  • It works as root, but there’s probably a way to accomplish this without using root. Do it that way.
  • You can accomplish the same thing with Windows servers and Powershell.
  • You broke it, not me.

To get this working, we need a public web server with the same domain same as you’d use in your RADIUS server’s cert common name. This means internal domain names with a .local TLD won’t work.

I setup two Ubuntu servers, one running the nginx web server with a public IP, and another on my local network running FreeRADIUS. The web server will run the Let’s Encrypt client and create and renew the certs. The RADIUS server will copy those certs from the web server and use them for PEAP authentication. Once setup, the process of renewing and installing the certs on the RADIUS server happens automatically, just like it would on a web server.

First, a public DNS A record needs to be setup with the domain name which will be used on the TLS cert common name, we’ll use, and point it to the IP address of the web server.

Once that is done, you can install and run the Let’s Encrypt client on the web server. It works with Apache too, but if you prefer nginx like me, follow these directions to get it setup with Ubuntu 14.04 or Ubuntu 16.04. Don’t skip over the part about using cron to run the renewal routine.

Now that we have the certs on the web server, we’ll turn our attention to the RADIUS server. The first thing we need to do is setup ssh public key authentication between the two servers. I used the root account on both servers to do this, so that I would have permissions everywhere I needed it. With public key authentication in place securely copying the certs in the future can happen automatically, without getting stopped by a password request. Here are instructions to get that working.

Now we’ll start configuring FreeRADIUS on the RADIUS server. I’m assuming you already have a working FreeRADIUS server. I’m using FreeRADIUS 3, and you should be too. I like to use a separate directory for the Let’s Encrypt certs.

root@freeradius:~# mkdir /etc/freeradius/certs/letsencrypt/

Now let’s try copying the certs from the web server to this directory on the RADIUS server. If public key authentication is working, you should not be prompted for a password.

root@freeradius:~# scp /etc/freeradius/certs/letsencrypt/
root@freeradius:~# scp /etc/freeradius/certs/letsencrypt/

Did it work? If so, you should see the certs in the new folder we created.

root@freeradius:~# ls /etc/freeradius/certs/letsencrypt/
fullchain.pem  privkey.pem

Now we need to configure FreeRADIUS to use the Let’s Encrypt certs for PEAP authentication. I have a previous blog about using different CA’s for PEAP and EAP-TLS on FreeRADIUS that should come in handy here. If you are using EAP-TLS too, be sure not to change that CA from your private CA! All we need to do now is modify /etc/freeradius/mods-enabled/eap with our new certs in the TLS section used for PEAP.

root@freeradius:~# nano /etc/freeradius/mods-enabled/eap

tls-config tls-peap should be changed to:

tls-config tls-peap {
 private_key_file = ${certdir}/letsencrypt/privkey.pem
 certificate_file = ${certdir}/letsencrypt/fullchain.pem

If you aren’t using multiple TLS configurations, this section is named tls-config tls-common. You can leave it like that.

Reload FreeRADIUS for the change to take effect.

root@freeradius:~# service freeradius reload
 * Checking FreeRADIUS daemon configuration...               [ OK ] 
 * FreeRADIUS daemon is running
 * Reloading FreeRADIUS daemon freeradius                    [ OK ]

Now when connecting to the WLAN that is configured to use this RADIUS server for 802.1X/PEAP  authentication, the client is presented with a valid Let’s Encrypt server certificate.


OK, we have a working FreeRADIUS server using Let’s Encrypt certs for 802.1X/PEAP authentication. Now let’s automate the process of getting renewed certs from the web server to the RADIUS server. We’ll use scp and cron to get this done.

On the RADIUS server, add these commands to root’s crontab, with the appropriate domain names.

root@freeradius:~# crontab -e
# m h dom mon dow command
0 3 * * 1 scp /etc/freeradius/certs/letsencrypt/
0 3 * * 1 scp /etc/freeradius/certs/letsencrypt/
5 3 * * 1 service freeradius reload

At 3:00 AM every Monday, cron will run copy the TLS certs from the web server the reload FreeRADIUS at 3:05 AM to put them into production. Now the Let’s Encrypt certs are automatically installed on the RADIUS server a few minutes after they are renewed on the web server. The certs are good for three months at a time and renewable one month in advance, so you’ll get renewed certs automatically installed every two months.

Presto! You now have Let’s Encrypt certs automatically renewed and installed on your RADIUS server. While a private CA is a better solution for 802.1X authentication, this isn’t bad for a $0 software stack.

Clear To Send Podcast Episode 62: K12 Wi-Fi Deployments

podcast_logoI recently had the pleasure of joining Rowell Dionicio on the Clear to Send Podcast to talk about Wi-Fi in K12 schools. Clear To Send is a great podcast about enterprise wireless networking and a great way to stay current with the Wi-Fi community.

We talked about K12 requirements, challenges, funding, my design process, security, and everyone’s favorite K12 subject, 1 AP per classroom!

After listening to the podcast, I thought about some other K12 Wi-Fi considerations that I didn’t bring up on the air.

  • K12 often has requirements for mDNS applications like Apple AirPlay for AppleTV or Google Cast for Chromecast. This is a challenge in an enterprise network because mDNS does not cross layer 2 boundaries. It’s important to consider that when designing a new WLAN and selecting the vendor. Many WLAN vendors do have features that can assist with relaying mDNS traffic between vlans. Be careful to limit this traffic to only the vlans where it is required.
  • Excessive multicast traffic can be a burden on channel utilization when it is not controlled. Many WLAN vendors have features that intelligently filter broadcast/multicast traffic, instead of always forwarding it out the AP radio interfaces at the lowest data rate. If you are dealing with mDNS or large subnets (common in K12) it’s worthwhile to understand how the WLAN can manage broadcast/multicast traffic.
  • MSP’s are a great way to get well-designed enterprise Wi-Fi into small to medium size schools that don’t have the internal resources to handle it themselves. MSP’s can be hired to support and operate the WLAN after installing it, which gives them an incentive that VAR’s who just sell the hardware might not have–to design the WLAN properly. E-Rate funding is now available to reimburse schools for managed services contracts with MSP’s.
  • eduroam is available for K12 schools, not just higher education. Check it out!
  • It’s hard to listen to the sound of your own voice.

I really enjoyed talking Wi-Fi with Rowell and I’d love to return to the podcast in the future. Maybe we can talk about healthcare Wi-Fi next? Thanks Rowell!

Have a listen here: CTS 062: K12 Wi-Fi Deployments – Clear To Send