The State of Guest Wi-Fi Security

encryption_lock

Most guest Wi-Fi networks today are open SSID’s with no encryption that have a captive portal that requires users to click through some terms and conditions. It would be nice to be able to secure these networks the same way we do with internal SSID’s–mutual authentication of the client and network, and strong layer 2 encryption, but that challenge has proven too difficult to accomplish without a high degree of friction. You could make users suffer through a lengthy and confusing onboarding process, but imagine doing that at every location where there is guest Wi-Fi? Not good. I agree with Keith Parsons’ take: Guest Wi-Fi should be fast, free, and easy. Security should be too.

How can we make this better? The Wi-Fi Alliance is certifying devices for a new security protocol called Opportunistic Wireless Encryption (OWE). Their certification is called Wi-Fi Enhanced Open, but I’ll refer to it as OWE for the purposes of this blog. OWE adds encryption to open WLAN’s with no client authentication, but it does not provide for server authentication, which leaves users vulnerable to man-in-the-middle (MitM) attacks. The authors of the RFC understood this, and wrote that “the presentation of the available SSID to users should not include special security symbols such as a ‘lock icon.'” Aruba Networks has already announced support for OWE, and I hope other vendors follow suit.

Unfortunately The Wi-Fi Alliance did not choose to make OWE support mandatory in WPA3. It’s a separate and optional certification. Perhaps they will right this wrong by requiring OWE support in Wi-Fi 6 certification, which could require WPA3 support just as 802.11n required WPA2 support. Why not tack on OWE to Wi-Fi 6 as well?

Secure Guest Wi-Fi with Hot Spot 2.0/Passpoint

I once believed that Hot Spot 2.0/Passpoint (HS2.0) was the future of secure guest Wi-Fi, because it allowed for anonymous authentication to a WPA2-Enterprise network. The problem is that users are still required to go through a high-friction onboarding process on every anonymous HS2.0 WLAN they wish to use. That means dealing with captive portals, terms and conditions, installing configuration profiles, etc.

HS2.0 does allow for automatic authentication with user creds from other identity providers. That would allow a user to login with pre-installed creds from their cellular carrier, Facebook, Amazon, Google, Apple, etc.

Telcos are the best choice here as their creds are already installed on mobile phones to authenticate with their cellular networks. However, telcos are unlikely to open their authentication service to WLAN operators for several reasons.

  • They want to be paid for providing this service, but SMB and many large enterprises don’t want to pay to increase the security of their guest networks.
  • It gives an implied endorsement of the security, quality, and reliability of the WLAN, which the telco knows nothing about.

That’s why you see telcos integrating with Boingo, for example, but not smaller players.

But what if there was a HS2.0 open roaming consortium that federated authentication from any identity provider that wanted to join? Something like eduroam for anyone.

The biggest problem is that WLAN authentication in such a scenario tells you nothing about the identity or security of… the WLAN. Users authenticate with their identity provider’s RADIUS servers, and the result is strong encryption in the air, but no guarantee of security on the wired network. They don’t get any information about the identity of the wired LAN that their bits are traversing, because the authentication is abstracted away from the network they are using. HS2.0 provides no identity verification of the network that users are actually using.

This is a smaller problem in eduroam, where most WLAN’s are run by higher education institutions and they agree to operate their networks a certain way. There is some homogeneity there, and users can expect similar security and terms of use between member networks.

An open roaming consortium would allow users to authenticate to a university’s WLAN and a dingy laundromat’s WLAN as if there was no difference. In fact, roaming between those networks would happen automatically without any user interaction. That’s an acceptable risk when all the networks in the consortium are similar (eduroam), but it isn’t when nothing can be assumed about the quality and security of member networks in an open roaming consortium.

Is it reasonable to assume an end-user wants to connect to any WLAN that supports their HS2.0 creds? My answer to that is a definite “no.” One benefit of the non-HS2.0 model is that a user must express an intent to connect to a new WLAN, which gives them the ability to decide if it is trustworthy or not. HS2.0 circumvents this process, and if it becomes more open and widespread, users may end up connecting to networks they don’t trust.

Secure Guest Wi-Fi with an On-Premises Solution

There are several on-premises BYOD or SGW onboarding solutions. They don’t solve the high-friction onboarding problem mentioned previously–they compound it, because the credentials they issue cannot be used between networks. Users must wrestle with a high-friction onboarding process with every SGW network they want to use.

The fundamental problem with Hot Spot 2.0 and On-Premises solutions is that they require client credentials. Authenticating users is not a requirement for SGW in my opinion, and I imagine that’s a common view. It creates unnecessary complexity for users and administrative overhead to WLAN operators. We need a solution for anonymous SGW.

An HTTPS-like Solution

For secure guest Wi-Fi, a security model similar to HTTPS would be great. Client identity is not important, but the WLAN identity should be verified, not just the RADIUS server. Strong encryption must be used, wireless network access must be resistant to MitM attacks, and users should only connect to a SGW network when they have expressed the intent to do so.

Additionally, all of the necessary configuration and complexity to accomplish this should be handled by the WLAN operator. For the end-user, it should “just work.”

Take the example of HTTPS: A web admin requests and is issued a DNS-validated TLS certificate signed by a public certificate authority. She then installs the cert on her web server, configures it for strong encryption, and adds an HTTP to HTTPS 301 redirect. Now visitors to the website are able to verify the website’s identity and connect to it with strong encryption, and they had to do nothing to get those security benefits except run a modern web browser. SGW should be just as easy for end users.

OWE gets us halfway there, but crucially, does not address the threat of MitM attacks. We need a WLAN-centric public key infrastructure (PKI) for that, and that’s the rub. Suddenly there’s a lot of administrative overhead to make this work. Perhaps it would look something like this:

An “Open RADIUS Certificate Authority,” or ORCA, would only issue certs to validated network operators, and those certs could only be used with specific SSID’s.

ORCA’s root cert would have to be be preinstalled and trusted by client devices for EAP authentication.

Wi-Fi clients would connect to an ORCA-enrolled SGW SSID and authenticate anonymously, then validate the ORCA-signed cert presented by the RADIUS server. The client verified that the cert has not been revoked and that it is connecting to an SSID that the cert has been permitted for use. The session is encrypted and the WLAN’s identity is verified. Clients only connect to ORCA-enrolled WLAN’s when they intend to, by clicking/tapping on the SSID in their Wi-Fi menu/settings.

All the end user has to do is tap/click on the SGW SSID to connect to it. Everything else is handled by the client device, the WLAN, and ORCA.

Ta da, we now have low-friction SGW, but for all this work, what have we really gained, today, in 2018?

If you run a packet capture on an open guest network today, you’ll see DNS queries and a whole lot of TLS sessions, not much else. Yes, SGW would add another layer of security on top of this, but at what cost? Making ORCA work is no small task, if it is even achievable in the first place.

Conclusions

OWE gives us layer 2 encryption, so that passive sniffing doesn’t reveal those DNS queries anymore. While OWE doesn’t address MitM rogue AP attacks, coupling it with 802.11w protected management frames, which is required for Wi-Fi Enhanced Open certification, adds resistance to malicious deauth attacks.

The work necessary to make my SGW scheme function doesn’t balance with the small gain in security. It’s better to take a perimeterless networking approach (e.g. BeyondCorp), only deploy hardened applications, and assume the networks your users use will not be trustworthy. If you do not use applications that expose their data to network-level interception or abuse, then have at it. How can an end-user ever truly know if a network is trustworthy anyway?

We can add a bit more security through OWE to help obscure the small amount of guest network traffic that remains unencrypted, and 802.11w protected management frames to prevent some rogue AP attacks. That’s going to have to be good enough.

Roaming Analysis using only a Mac and Wireshark

There are many ways to examine the roaming performance of a Wi-Fi client. Perhaps the gold standard is to follow the client with a laptop running Omnipeek and several Wi-Fi adapters all capturing frames on different channels. I’m also impressed with 7signal’s recent update to Mobile Eye which now logs roaming data as well. But what if you don’t have that, or want to do something quickly with a Mac without switching to Windows and hooking up your Wi-Fi adapter array?

Using a Mac laptop to capture frames on a single channel with Airtool, you can still get valuable information about the roaming performance of a Wi-Fi client with a few Wireshark display filters and some I/O Graphs magic.

The process is simple. Discover the channel the test client is using, and start an over-the-air capture on that channel. Take you Mac and the test client and move out of the current AP’s cell so the client roams away, then come back so that the client roams back. Repeat as necessary until you have captured both a roam-away and a roam-back.

roam_capture
Let’s roam

Now it’s time to look at the captured frames. First, let’s build a display filter to only show the frames to/from the test client, as well as all of the AP’s beacon frames. We’re including the AP’s beacon frames so that we can see the changes in RSSI as the client moved away from then back towards the AP.

wlan.addr == aa:bb:cc:dd:ee:ff || ( wlan.ta contains 11:22:33:44:55 && wlan.fc.type_subtype == 8)

aa:bb:cc:dd:ee:ff is the MAC of the test client. 11:22:33:44:55 is the first five octects of the AP’s BSSID. By matching on the first five octets of the AP’s BSSID rather than the exact BSSID, we preserve the beacon frames from all of the AP’s BSSID’s, which will gives us more data to calculate the RSSI of the AP.

Once applied, export the displayed packets only to a new file that we’ll generate the graphs from. Open the new file and now we can configure the I/O Graphs. These are some of the display filters I use:

7925_roam_graph.png
The roam-away is on the left, and the roam-back is on the right.

AVG Tx Data Rate needs to be set with the test client MAC address, and AP RSSI needs to have the first five octets of the AP’s BSSID.

By zooming into the beginning of the graph, we can observe the client’s data frames, retries, Tx data rate, and the RSSI at which it roamed away. A benefit of dBm being measured in values less than zero is that it is separated from the rest of the data on the graph, so we have layer 1 data below 0, and layer 2 data above.

7925_roam_follow_roam

This Cisco 7925G phone roams away before the AP’s RSSI drops to 70 dBm, and before retries start to increase. We see similar good behavior when it roams back below.

7925_roam_follow_back

Let’s take a look at a Wi-Fi client that roams poorly. Here’s a client-that-shall-remain-nameless roaming away from an AP. You can see retries spiking and its data rate plummeting well before it roams away. The AP’s RSSI drops into the -80’s for most of a minute before it decides to roam!

bad_roam-filtered
This graph includes the test client’s average Tx data rate.

Of course, this approach has some limitations. You must know that a client like the one above was in range of a louder AP operating on a channel it supports when it started having trouble before you decide it’s a sticky client, otherwise it’s doing exactly what it should be doing–trying to maintain the only association it can.

You know when the client decided to roam, but you don’t know how long it took.

As you move away from the AP, you might see the AP’s RSSI spike to 0. That happens when your laptop’s adapter is unable to demodulate beacon frames from the AP due to poor SNR.

Also, the AP RSSI is measured by a Mac laptop that is following the test client. Unless the test client is the same model of Mac laptop, it will probably hear the AP differently, most likely with less sensitivity. My MacBook Pro is a 3×3:3 client, and the two test clients I looked at for this blog are both 1×1, so it’s reasonable to assume the Mac benefits from a significant increase in RSSI from MRC. Taking that into consideration, the poor roaming from the client-that-shall-remain-nameless is probably even worse than it looks.

Beware of mDNS Floods from Buggy Android Clients

android_sickRecently, I discovered a large increase in multicast traffic on an enterprise Cisco WLAN. This increase was large enough to cause packet loss in several areas where bandwidth is limited, usually at the WAN edge. While throughput remained within the acceptable range for a circuit, an extremely high packet rate was overwhelming the edge device’s capacity to process packets. So despite bandwidth utilization being normal, packet loss was still occuring.

The primary symptom of this problem was poor voice call quality on calls across the WAN, at seemingly random times, including periods where bandwidth utilization was very low.

A close inspection of one particular site uncovered abnormally high packet rates, measured in pps (packets per second). We used a span port to run a packet capture of the traffic, with the hope of isolating a source of the packet flood. Here’s what we found:

mdns-flood-protocols
Wireshark’s Protocol Hierarchy Window

37% of the packets on this busy WAN circuit were mDNS queries! More specifically, the sources were smartphones on the guest WLAN that were sending mDNS queries for Chromecasts, Apple TV’s, and other services advertised with mDNS. In a centrally-switched WLAN, all of that traffic is backhauled across the WAN to the WLC which services it. These queries were leaving remote sites in a CAPWAP tunnel, traversing the WAN to the WLC, which then forwarded them on to all of it’s AP’s at all other sites, once again traversing the WAN.

Once we had this data, it became any easy problem to fix. There was no business requirement for multicast on the guest WLAN, so we blocked it at the WLC using a layer 3 ACL. This ACL blocks all multicast, but we added a separate line for mDNS to get an idea of the volume of mDNS packets compared to other multicast traffic on the WLAN.

mdns-flood-acl
ACL to block all multicast on the WLAN

You can see from the hit counters that this guest WLAN is seeing a lot of mDNS traffic, which dwarfs any other multicast traffic. With this ACL in place, the problem was resolved. mDNS uses multicast address 224.0.0.251 and UDP port 5353, if you want to block it more precisely.

Looking back at our pre-ACL pcap, I observed several clients on the guest WLAN flooding the network with mDNS queries, which the WLC then forwarded back across the WAN to its AP’s. The top five worst offenders in the pcap were all Android phones, with OUI’s from Samsung, HTC, and LG, and they were each responsible dumping thousands of mDNS packets on the network in a matter of seconds.

mdns-flood-client-pcap
An unfiltered pcap from the WAN edge should not look like this!

In a network with several thousand guest devices, most of which are smartphones, if enough of these buggy Android phones are dumping mDNS queries like this, it could result in serious problems if it is not controlled. The packet rate at the WAN edge may cause issues with other traffic if it results in output drops (such as voice which we observed), and the increase in multicast traffic on the RF may be an issue as well if the WLAN handles multicast traffic normally, which means transmitting it at the lowest mandatory data rate. That will increase airtime utilization substantially, leaving less airtime for other clients to transmit and receive data.

Until recently, I did not understand what triggered this problem to occur. Now I do. There appears to be a bug in recent versions of Android that results in these mDNS floods when the phones leave sleep mode, resulting in thousands of frames being transmitted in the seconds after the client wakes up. TP-Link discovered the same behavior we observed and traced it to “recent releases of Android OS and Google Apps.”

For WLAN administrators, I suggest taking a few minutes to think through the implications of this issue for your network. As I discovered in my troubleshooting, it is very likely to affect more than just the guest WLAN.

PS: For those of you reading this that aren’t network engineers and think your wireless router is junk, it isn’t. And please stop jumping to the conclusion that the infrastructure is to blame for Wi-Fi problems! 80% of Wi-Fi problems are client device issues, not infrastructure issues.

Update: Google has acknowledged this issue and is releasing a patch via Google Play on January 18th.