The security features provided by the IEEE 802.11 standard haven’t changed much since the 802.11i amendment was ratified in 2004, which is more commonly known by its Wi-Fi Alliance certification name WPA2. 802.11w protected management frames were introduced in 2009, but it is only recently that Wi-Fi chipsets for client devices have included support for it. WPA2 introduced the robust CCMP encryption protocol as a replacement for the compromised WEP-based encryption schemes of the past. CCMP utilizes stronger 128 bit AES encryption keys. As a general rule of thumb, if you aren’t using CCMP on a Wi-Fi network designed for security, you’re doing it wrong. It’s been out for a long time and older protocols have well-established weaknesses.
However, there are some new encryption changes in the 802.11ac amendment which have mostly flown under the radar. Besides 256 QAM, wider channels, and MU-MIMO, 802.11ac now includes support for 256 bit AES keys and the GCMP encryption protocol. Galois Counter Mode Protocol is a more efficient and performance-friendly encryption protocol than CCMP.
A few interesting nuggets from section 11.4 of the 802.11ac amendment:
The AES algorithm is defined in FIPS PUB 197-2001. All AES processing used within CCMP uses AES with either a 128-bit key (CCMP-128) or a 256-bit key (CCMP-256).
CCMP-128 processing expands the original MPDU size by 16 octets, 8 octets for the CCMP Header field and 8 octets for the MIC field. CCMP-256 processing expands the original MPDU size by 24 octets, 8 octets for the CCMP Header field, and 16 octets for the MIC field.
By the way, you can download the 802.11ac amendment or the entire 802.11-2012 standard from the IEEE here for free. For more on these security changes read sections 184.108.40.206 and 11.4 of the 802.11ac amendment.
It seems odd that these changes were included in the 802.11ac amendment, and not in a separate security-focussed amendment like 802.11w and 802.11i. Nothing wrong with it, just unexpected. I’m curious to see if the 802.11ax amendment includes security changes as well.
Why the addition of 256 bit AES keys? It could have something to do with a few chinks in the armor of 128 bit AES keys. The current attacks appear to be impractical, but future attacks that take advantage of quantum computing may put 128 bit AES keys at risk. NIST thinks that larger key sizes are needed to defend symmetric AES keys like those used in WPA2 against quantum computer attacks, which they say will be operational within the next 20 years. I’ll take their word for it.
Because the amendment only specifies CCMP-128 as mandatory for RSN compliance, it’s very unlikely that we’ll see CCMP-256/GCMP-256 in use anytime soon. Further, enabling 256 bit cipher suites effectively disables support for all non-802.11ac clients as well as 802.11ac clients that only support the mandatory cipher suites (most of them?). That’s because CCMP-256 and GCMP-256 pairwise keys are only compatible with 256 bit group keys, breaking backwards compatibility with legacy clients. There are also a lot of 802.11n clients out there that aren’t going away anytime soon, so actually deploying CCMP-256/GCMP-256 will require a separate CCMP-256/GCMP-256-only SSID. Excited yet?
Further, I can’t find any documentation that suggests that infrastructure vendors have implemented CCMP-256/GCMP-256 at all, just a few slide decks here and there with an overview of the changes. These cipher suites appear to be optional, so I wonder if any VHT clients or AP’s actually support them today, and when they will in the future. The Linux Wi-Fi configuration API cfg80211 and driver framework mac80211 have added software support for it. That’s about all the implementation I have found. Perhaps PCS compliance or Wi-Fi Alliance certification will eventually force the issue, or perhaps it will go the way of 802.11n Tx beamforming and never be implemented. There are a lot obstacles to overcome before 256 bit keys become practical.
However, a VHT client can negotiate a GCMP-128 RSNA within a BSS that uses a backwards-compatible CCMP-128 group key, and the 802.11 standard does support multiple pairwise cipher suites within a BSS (remember TSN’s?). That allows the GCMP-128 pairwise cipher suite to be used alongside everyday CCMP-128 pairwise and group keys on real, production networks.
To tell if a BSS is using one of the new cipher suites in a packet capture, look at a beacon frame’s RSN information element. The cipher suite selector is always 00-0F-AC for the CCMP/GCMP encryption protocols, it’s the cipher suite type that distinguishes between the specific cipher suites. For example, 00-0F-AC:4 is the default CCMP-128, 00-0F-AC:9 indicates GCMP-256 and 00-0F-AC:10 indicates CCMP-256. Group keys for a BSS with protected management frames have their own suite type numbers. Look for multiple pairwise cipher suites to find support for the new stuff. Here’s the table of the new cipher suites. I’m on the lookout for 00-0F-AC:8 (GCMP-128), but I’ve yet to find a beacon frame with it advertised.
Table 8-99—Cipher suite selectors
CCMP-128 – default pairwise cipher suite and default group cipher suite for data frames in an RSNA
BIP-CMAC-128—default group management cipher suite in an RSNA with management frame protection enabled
GCMP-128 – default for a DMG STA
Interesting note that GCMP-128 is the default for a DMG STA, which is a directional multi-gigabit station defined in the 802.11ad amendment for operation in the 60 GHz band.
The standard limits the mixing of cipher suites so that the key sizes of the pairwise and group keys must match, and GCMP group keys can only be used with GCMP pairwise keys.
This post outlines some configuration changes which can enhance the security of 802.1X EAP methods PEAP and EAP-TTLS, which use a temporary layer 2 TLS tunnel to protect a less secure inner authentication method. While EAP-TLS doesn’t create a full TLS tunnel, it does use a TLS handshake to provide keying material for the four-way handshake. It needs strong TLS too.
Standard 802.1X security best practices should also be implemented such as using strong passwords, disabling insecure EAP methods, disabling TKIP, proper supplicant configuration, deploying sha-2 certificates, and anonymous outer usernames. The focus here is the TLS tunnel exclusively.
Not all RADIUS servers can implement all of these suggestions, but some can certainly do more than others. My experience has been with Microsoft NPS and FreeRADIUS servers so that is what I’ll refer to when discussing specific implementations. I welcome input from Aruba ClearPass and Cisco ISE administrators on configuring those servers as well.
Why go through all the trouble? It turns out the same encryption techniques that are used by web clients and servers to protect data in HTTPS sessions are also used when EAP methods rely on a TLS encrypted session. Ask any web server admin, and they’ll tell you that not all HTTPS is created equally. The same vulnerabilities that web server admins deal with exist in TLS-assisted EAP methods used on the WLAN as well. There is a lot to be learned from the TLS best practices that are recommended for web server admins.
At the end of the day, the TLS session is all that stands between user credentials and would-be hackers. It needs careful consideration to verify that it is meeting current security standards.
Here’s what to do.
We’re talking specifically about SSLv2 and SSLv3 here, not TLS, the collection of which is often referred to simply as “SSL.” SSLv2 and SSLv3 were cracked long ago.
Consider TLS Methods
TLS 1.2 is the most secure TLS method available, so why not disable TLS 1.0 and TLS 1.1? Right now supplicant support for TLS 1.1 and TLS 1.2 is far from universal, and TLS 1.0 with strong ciphers is still considered secure. Keep TLS 1.0 enabled for now.
Disable Weak Cipher Suites
Cipher suites are the specific encryption algorithms that are used in a TLS session. Supplicants and servers support a broad range of them, and some of them are better than others. Many RADIUS servers have older insecure cipher suites enabled by default. This allows old supplicants that do not support newer cipher suites to still function. Unless you have older supplicants, you can disable many of these cipher suites to enhance 802.1X security.
A current listing of strong cipher suites can be found at Cipherli.st. While the website focuses on web server configuration, TLS is TLS.
Be aware that EAP-TLS requires TLS_RSA_WITH_3DES_EDE_CBC_SHA.
Microsoft NPS relies on Schannel to provide encryption for TLS-tunneled EAP methods. In order to control the protocols Schannel uses, an administrator must alter these registry keys. Note that changing these keys affects all TLS functionality on the server, so if you run IIS or RDS with TLS, these changes will affect those applications as well. Proceed with caution. The registry keys can be found in:
A full listing of cipher suites supported by Schannel can be found here.
If the prospect of manually editing dozens of registry keys on a Windows Server doesn’t appeal to you, the good people at Nartac Software have developed an application that allows these changes to be managed in a user-friendly GUI interface. IIS Crypto allows you to make all of the registry settings necessary for this, while also including some handy templates including Best Practices, PCI, FIPS 140-2, and Defaults.
Here is IIS Crypto displaying the default Schannel configuration of a Windows Server 2012 R2 server. There is a lot not to like here…
And here is the Best Practices template. Note the obsolete protocols and cipher suites that are disabled, and the order in which cipher suites are prefered is updated as well.
Be aware that manually taking control of the Schannel TLS configuration means you’re in charge of it going forward. If Microsoft updates the default configuration, your manual config may still be in place. Stay up-to-date on new TLS vulnerabilities and periodically review your configuration for needed changes.
FreeRADIUS 3 is the current supported stable release and you should be thinking about upgrading to it if you have not already. SSLv2 and SSLv3 are not supported by FreeRADIUS 3, only TLS 1.0, TLS 1.1, and TLS 1.2.
For FreeRADIUS to require stronger cipher suites, add this to the EAP-TLS configuration in the “eap” configuration file. Alternatively, specify a colon-separated list of specific cipher suites.
cipher_list = "HIGH"
Also be aware that FreeRADIUS 2.2.6 and 3.0.7 and contain a critical bug that prevents successful TLS 1.2 sessions from starting. You should update these servers as soon as possible.
Harden Supplicants Too
Few 802.1X supplicants allow you to alter their TLS configuration. The best thing to do with supplicants is to routinely install system updates and retire clients that are EOL.
Documentation for the TLS capabilities of client supplicants is hard to come by. Microsoft published an update to Windows 7 and above to allow the use of TLS 1.1 and TLS 1.2 in its 802.1X supplicant, if configured manually for now. wpa_supplicant for Linux supports TLS 1.2 in version 2.0 and version 2.6 enabled it by default. TLS 1.2 is the default TLS version used in the supplicants for Windows 10, Mac OS 10.11, iOS 9, and Android 6.0 (Update: It appears that Apple has deferred their decision to default to TLS 1.2 in iOS 9/ Mac OS 10.11 until a later release).
Lab it Up
To know definitively what a client supplicant is capable of, run a packet capture on TLS-tunneled EAP authentication and observe the TLS negotiation frames, or TLS handshake, that occur right after 802.11 association and EAP identity request/response frames.
The client will send a “Client Hello” frame in which Wireshark will mark as a TLS protocol frame. This frame includes the TLS version requested by the client along with its supported cipher suites. The TLS version is the highest version the client supports.
Next, the RADIUS server will respond with a “Server Hello” frame which specifies the TLS version and cipher suite to be used during the TLS session, and includes the server certificate as well. The server will choose the best cipher suite that both client and server support and the highest TLS version that both support as well.
A few more frames are exchanged to setup the TLS session, and then EAP authentication takes place within the encrypted TLS session. It’s these first two frames that are of most concern when documenting client TLS capabilities.
This is also a useful technique to use to verify that highly secure TLS encryption is occurring in production.
In the K-12 market Chromebooks are the most common devices used in 1:1 programs. If you are designing high density Wi-Fi networks for Chromebook 1:1 programs, it helps to know how to access their Wi-Fi statistics, logs, and networking tools. This knowledge is valuable for troubleshooting day-to-day Chromebook Wi-Fi issues as well.
Despite its simplicity, Chrome OS, the Linux variant that Chromebooks run, does have some useful diagnostics tools that can help troubleshoot Wi-Fi problems. Most of these tools are included in the crosh shell, which you can open by typing Control-Alt-T. Here are some of my go-to crosh networking commands that don’t require an explanation.
This command provides some good Wi-Fi stats like retries, MCS index, and also RoamThreshold, which is the SNR at which this Chromebook will attempt to roam to a new BSS. Hopefully, one day we’ll be able to modify this value on enterprise-managed Chromebooks through the Google Apps admin console.
This command is very useful in troubleshooting 802.1X issues. It shows more layer 2 details on all the BSS’s that have been discovered. In this case, /service/12 is an 802.1X network that the Chromebook is associated with, and /service/15 an open network also in range.
This command brings up a lot of valuable information including a dump of the latest full channel scan and the Wi-Fi chipset’s capabilities, among other useful data.
crosh> network_diag --wifi
iw dev wlan0 survey dump:
Survey data from wlan0
frequency: 2412 MHz
noise: -92 dBm
channel active time: 63 ms
channel busy time: 49 ms
channel receive time: 45 ms
channel transmit time: 0 ms
Survey data from wlan0
frequency: 2417 MHz
noise: -93 dBm
channel active time: 62 ms
channel busy time: 47 ms
channel receive time: 41 ms
channel transmit time: 0 ms
Survey data from wlan0
frequency: 2422 MHz
noise: -92 dBm
channel active time: 63 ms
channel busy time: 4 ms
channel receive time: 0 ms
channel transmit time: 0 ms
Survey data from wlan0
frequency: 5220 MHz
noise: -94 dBm
channel active time: 124 ms
channel busy time: 0 ms
channel receive time: 0 ms
channel transmit time: 0 ms
Survey data from wlan0
frequency: 5240 MHz [in use]
noise: -94 dBm
channel active time: 15723 ms
channel busy time: 513 ms
channel receive time: 185 ms
channel transmit time: 3 ms
Survey data from wlan0
frequency: 5260 MHz
noise: -94 dBm
channel active time: 85031 ms
channel busy time: 84907 ms
channel receive time: 84907 ms
channel transmit time: 84907 ms
iw dev wlan0 station dump:
Station 00:11:74:##:##:## (on wlan0)
inactive time: 5444 ms
rx bytes: 11797197
rx packets: 38419
tx bytes: 1703260
tx packets: 9779
tx retries: 14295
tx failed: 43
signal: -58 dBm
signal avg: -60 dBm
tx bitrate: 24.0 MBit/s
rx bitrate: 300.0 MBit/s MCS 15 40MHz short GI
TDLS peer: no
iw dev wlan0 scan dump:
BSS 00:11:74:##:##:##(on wlan0) -- associated
TSF: 61418055#### usec (7d, 02:36:20)
beacon interval: 100 TUs
capability: ESS Privacy SpectrumMgmt ShortSlotTime (0x0511)
signal: -60.00 dBm
last seen: 847370 ms ago
Information elements from Probe Response frame:
Supported rates: 24.0* 36.0 48.0 54.0
DS Parameter set: channel 48
Country: US Environment: Indoor/Outdoor
Channels [36 - 36] @ 24 dBm
Channels [40 - 40] @ 24 dBm
Channels [44 - 44] @ 24 dBm
Channels [48 - 48] @ 24 dBm
Channels [52 - 52] @ 23 dBm
Channels [56 - 56] @ 23 dBm
Channels [60 - 60] @ 23 dBm
Channels [64 - 64] @ 23 dBm
Channels [100 - 100] @ 24 dBm
Channels [104 - 104] @ 24 dBm
Channels [108 - 108] @ 24 dBm
Channels [112 - 112] @ 24 dBm
Channels [116 - 116] @ 24 dBm
Channels [120 - 120] @ 24 dBm
Channels [124 - 124] @ 24 dBm
Channels [128 - 128] @ 24 dBm
Channels [132 - 132] @ 24 dBm
Channels [136 - 136] @ 24 dBm
Channels [140 - 140] @ 24 dBm
Channels [144 - 144] @ 24 dBm
Channels [149 - 149] @ 30 dBm
Channels [153 - 153] @ 30 dBm
Channels [157 - 157] @ 30 dBm
Channels [161 - 161] @ 30 dBm
Channels [165 - 165] @ 30 dBm
Power constraint: 3 dB
* station count: 2
* channel utilisation: 4/255
* available admission capacity: 31250 [*32us]
SM Power Save disabled
RX HT20 SGI
RX HT40 SGI
RX STBC 1-stream
Max AMSDU length: 7935 bytes
No DSSS/CCK HT40
Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
Minimum RX AMPDU time spacing: 8 usec (0x06)
HT TX/RX MCS rate indexes supported: 0-15
* primary channel: 48
* secondary channel offset: below
* STA channel width: any
* RIFS: 1
* HT protection: no
* non-GF present: 1
* OBSS non-GF present: 0
* dual beacon: 0
* dual CTS protection: 0
* STBC beacon: 0
* L-SIG TXOP Prot: 0
* PCO active: 0
* PCO phase: 0
VHT Capabilities (0x338001b2):
Max MPDU length: 11454
Supported Channel Width: neither 160 nor 80+80
short GI (80 MHz)
RX antenna pattern consistency
TX antenna pattern consistency
VHT RX MCS set:
1 streams: MCS 0-9
2 streams: MCS 0-9
3 streams: not supported
4 streams: not supported
5 streams: not supported
6 streams: not supported
7 streams: not supported
8 streams: not supported
VHT RX highest supported: 0 Mbps
VHT TX MCS set:
1 streams: MCS 0-9
2 streams: MCS 0-9
3 streams: not supported
4 streams: not supported
5 streams: not supported
6 streams: not supported
7 streams: not supported
8 streams: not supported
VHT TX highest supported: 0 Mbps
* channel width: 1 (80 MHz)
* center freq segment 1: 42
* center freq segment 2: 0
* VHT basic MCS set: 0xfffc
WMM: * Parameter version 1
* BE: CW 15-1023, AIFSN 3
* BK: CW 15-1023, AIFSN 7
* VI: CW 7-15, AIFSN 2, TXOP 3008 usec
* VO: CW 3-7, AIFSN 2, TXOP 1504 usec
RSN: * Version: 1
* Group cipher: CCMP
* Pairwise ciphers: CCMP
* Authentication suites: IEEE 802.1X FT/IEEE 802.1X
* Capabilities: PreAuth 1-PTKSA-RC 1-GTKSA-RC MFP-capable (0x0081)
* 0 PMKIDs
* Group mgmt cipher suite: AES-128-CMAC
iw dev wlan0 link:
Connected to 00:11:74:##:##:## (on wlan0)
RX: 11797197 bytes (38419 packets)
TX: 1703260 bytes (9779 packets)
signal: -58 dBm
tx bitrate: 24.0 MBit/s
bss flags: short-slot-time
dtim period: 1
beacon int: 100
That’s a lot more Wi-Fi data than most other platforms make natively accessible.
Additionally, to view most of this data without crosh, use this internal Chrome URL. Just enter it into the address bar and hit enter.
Areas of interest for Wi-Fi data:
network-devices – same output as the “connectivity show devices” crosh command
network-services – same output as the “connectivity show services” crosh command
wifi_status – same output as the “network_diag –wifi” crosh command
lspci – you can see the Wi-Fi chipset hardware here (more on that later)
You can start logging Wi-Fi events using this crosh command.
crosh> network_logging wifi
Old flimflam tags: 
Current flimflam tags: [device+inet+manager+service+wifi]
method return sender=:1.1 -> dest=:1.146 reply_serial=2
Old wpa level: info
Current wpa level: msgdump
View the resulting device event logs at this internal Chrome URL: chrome://device-log/
Run this command to view the kernel log, which includes a lot of Wi-Fi events. I wish there was a –follow option, but currently there is not.
A restart will return the Chromebook to normal logging levels.
And if you really want to bury yourself in logs, go to chrome://net-internals/#chromeos, click Wi-Fi to enable debugging on that interface, let the “capturing events” count creep up while you perform a task, then click “Store debug logs” to save a debug-logs_<date>.tgz archive in your Downloads folder. Be warned, the signal to noise ratio is very low with this approach. Google provides a log analyzer that you can upload these files to, but I’ve never had the need to go that far down the road. This is best used if you need to submit logs to the Google Apps Enterprise Support Team or a hardware manufacturer.
Advanced Wi-Fi Analysis with Developer Mode
But wait, there’s more! If you can put a Chromebook into Developer Mode, you can run packet captures and break into the Linux bash shell. Most enterprise-managed Chromebooks will have this mode disabled for obvious reasons, but it’s easy enough to move your test Chromebook into a test OU and disable this and other restrictions for testing purposes. (That’s IT testing, not high-stakes student testing! Make sure your OU’s clearly differentiate the two.)
First, determine which channel’s frequency you’d like to run the capture, and also if channel bonding is in use. The internal URL from above will work for this as well as the “network_diag –wifi” crosh command. The frequency of the currently associated BSS is displayed at the end of that output here.
Now turn off the Wi-Fi NIC in the GUI so it can be put into monitor mode.
You can now run the packet capture using the crosh command below.
Optionally, specify a secondary channel above or below the primary if you are doing a 40 MHz 802.11n capture by appending the “–ht-location <above|below>” flag.
crosh> packet_capture --frequency <frequency in MHz>
Capturing from phy0_mon. Press Ctrl-C to stop.
^CCapture stored in /home/chronos/user/Downloads/packet_capture_7K08.pcap
You’ll get a pcap file complete with Radiotap headers if the hardware supports it saved in the Downloads folder which you can send to another machine to do analysis. If the Chromebook is all you have available, you can upload the pcap to CloudShark for analysis.
Wi-Fi Troubleshooting in Bash
Once you’ve got Developer Mode enabled, you can use the bash shell and follow the network log (or any other log) as things happen. This is my preferred way to troubleshoot Chromebook Wi-Fi issues in real time.
Now go do something to the Wi-Fi connection and watch the log scroll by.
A few Linux networking commands you may already know are available here as well like ifconfig, arp, and netstat.
Wi-Fi Chipset and Driver Information
While you’re in the bash shell, you can also determine the Wi-Fi chipset hardware in use. The output of this lspci command will only show the Wi-Fi adapter and the driver it is using. The basic output of lspci is included in chrome://system, but this method allows you to get more data. Add a -v flag or two to see even more.
This Acer C720 Chromebook has a Qualcomm Atheros AR9462 and uses the ath9k driver.
Run this command to discover the Wi-Fi chipset driver version. This is helpful if you want to know if the Wi-Fi chipset drivers were updated during a system update.
chronos@localhost / $ sudo ethtool -i wlan0
In this case no version number is reported, perhaps because the OS is using a generic Atheros driver that is packaged with the Linux kernel.
Below is the output of the same commands on an HP Chromebook 11 G4 running Chrome OS 41. This machine has an Intel Wireless-AC 7260 chipset and the driver and firmware-version are listed.
An inspection of the iwlwifi version history shows that this driver is actually newer than the previous version. Before version 16 it was the third number in the version that indicated what major branch it came from, so version 220.127.116.11 was actually from the version 10 branch. Thankfully, that’s cleared up in newer versions of the driver so that the first number is the version branch.
It’s good to see that Google includes Wi-Fi chipset driver updates with Chrome OS updates. This is especially nice as system updates are downloaded and installed automatically to Chromebooks. Personally, I’ve seen system updates resolve odd Chromebook Wi-Fi problems and it’s possible the newer drivers are the solution.
There’s been a lot of good discussion within the Wi-Fi community recently about the viability of radio resource management (RRM), or the automatic selection of channels and Tx power settings by proprietary vendor algorithms. At Mobility Field Day 1 there was this excellent roundtable.
Personally, I usually fall into the static design camp, for many of the same reasons as others. I don’t want RRM to change the carefully tuned design I put in place and create an unpredictable RF environment, I’ve seen RRM do some very peculiar things like put adjacent AP’s on the same channels or crank up the Tx power of 2.4 GHz radios in an HD environment, RRM doesn’t disable 2.4 GHz radios when CCC is present, and it doesn’t plan DFS channels properly. Still, I’ve tried to keep an open mind.
Static designs have their limitations too. Statically designed WLAN’s can’t react to new neighboring networks contending for the same airtime, or new sources of RF interference that weren’t there when the static design was developed. It’s a real benefit of RRM that it does automatically correct for these problems.
Let me propose a hybrid approach that uses static design to handle the things that RRM does poorly, while still allowing RRM to react to the changing RF environment.
Static Design Elements
Tx power levels should be statically assigned. Once finely tuned as part of the design process, why would they ever need to change?
Excess 2.4 GHz radios in high density environments should be manually disabled because RRM simply won’t do this.
DFS channels should be statically planned. RRM can clump DFS channels near one and other, resulting in a 5 GHz dead zone for clients without DFS support. Also, because of these clients, DFS channels should only be used when non-DFS channels are all already deployed. Therefore, statically plan DFS channels when needed in areas where non-DFS channels create secondary coverage, and let RRM dynamically plan the other bands. It’s less likely to have a neighbor or transient hotspot appear in the DFS bands anyway.
Set channel channel bandwidth statically. The design process includes considering the capacity requirements of the WLAN to determine the appropropriate 5 GHz channel bandwidth. RRM algorithms don’t know what your capacity requirements are. 2.4 GHz should always be 20 MHz.
Things Left to RRM
2.4 GHz channel planning, once excess radios are disabled. Channels 1, 6, and 11 only, of course.
5 GHz channel planning, once DFS channels are statically assigned.
The benefit of this approach is that it addresses many of the shortcomings of RRM while still retaining its main benefit: the WLAN can dynamically react to RF interference and transient neighbors by moving affected AP radios to clear spectrum. The things that RRM can’t do or does poorly are simply removed from its control.
Even within these constraints, there are still some vendor’s RRM algorithms I trust more than others. And even those I trust enough to try this with, I’d still want to monitor regularly to make sure the WLAN hasn’t turned into the RRM trainwreck the I’ve seen all too often when RRM is given free reign.
Enterprise Wi-Fi is expensive, very expensive. For schools with limited budgets and a responsibility to be good stewards of tax dollars, it is important to get it right, without spending more than necessary on the initial deployment, ongoing support, or fixing costly mistakes. Any savings can be used in other ways to improve education, so unnecessary spending on Wi-Fi can have an impact on the quality of education in schools.
That’s why it is critical for schools to work with Wi-Fi professionals to develop a sound design for the network before it is purchased and deployed. Fixing mistakes after the fact costs a lot of money. The usual “fix” of installing extra access points in areas where performance is poor can often make the situation worse, when the real solution might be to remove an AP or correct a bad channel plan.
What often happens is this: A vendor talks the school into purchasing one AP per classroom and then the channel planning is left up to auto-channel algorithms (known as RRM, or radio resource management). This is a very simple and seemingly easy way to get Wi-Fi in schools that doesn’t involve the headaches of procuring CAD drawings, performing multiple site surveys, collecting client device data, and other things that delay the installation of the Wi-Fi network and increase the up-front costs.
Don’t do it!
The big problem here is that this is extremely inefficient. Do schools need one AP per classroom? Some do, some don’t. You’ll only find out by doing a proper network design. Maybe the design process reveals that a school only needs one AP per two classrooms. A school like this that doesn’t bother with a design and just does one AP per classroom has spent 100% more money than it needed to.
Capacity issues aside, what about channel planning and radio transmit power control?Nearby AP’s on the same channel interfere with each other. Vendors love to tout their RRM as effective means to automatically set these controls optimally. Just turn it on and let the magic happen.
The truth is, RRM just can’t be trusted. It may work for a while, and then it changes something and it doesn’t. My experience has shown that RRM is fine for simple networks with few neighbors, but in the high density, busy RF environment of K12 schools it often fails miserably. Neighboring AP’s end up on the same channel resulting in interference with one and other. Transmit power goes up and down unpredictably. Your Wi-Fi network is an unpredictable moving target. What you measured and validated at one location one day is different the next day, and so on. The ongoing cost of supporting a network in this state is much higher than one that began with a proper design.
While some vendors’ RRM is better than others, no vendor is immune to this. A better solution is a proper design where channels and transmit power are determined by a Wi-Fi professional who is informed by years of experience and site survey data that RRM algorithms can’t factor into their decision making.
It is critical that schools include a proper Wi-Fi design in their Wi-Fi deployments to save tax dollars that would better be spent on other educational needs, and prevent many future headaches that result from over/under capacity networks and bumbling RRM algorithms. The Wi-Fi design process avoids these issues, and leaves schools with efficient, stable networks and the confidence in knowing that the network was validated against their needs, with the data to prove it.
Beyond the tax dollars, in a 21st century classroom, what is the true cost of poor Wi-Fi?
I decided to write this blog because there appears to be a very common misunderstanding about how Wi-Fi works among end-users and even many network administrators as well. Instead of repeating myself, I can share this link with folks that need a little lesson in 802.11 operation.
Wi-Fi is does not work like AM/FM broadcast radio.
Well, in some ways it does, Wi-Fi radios transmit and receive radio frequency energy (RF) just like AM/FM stations do, but it’s operation is much more complex. If you are stuck in the AM/FM radio analogy, you’ll make several mistakes with Wi-Fi, such as:
Coverage is considered, not capacity. Again, if Wi-Fi were a one-way radio broadcast like AM/FM radio, you’d only need to provide a strong “Wi-Fi signal” for everything to work well. This leads you down this next path.
The “Wi-Fi signal” (using this term might be a tell that the person speaking is stuck in the AM/FM radio analogy) is too low, so crank up the AP’s transmit power to make it louder.
Every problem is thought of as an infrastructure problem, client radios are not considered when troubleshooting.
Getting hung up on the vendor’s name that is on the access point, without considering what is much more crucial, the overall design that went into the network.
How Wi-Fi Actually Works
Wi-Fi is not a one-way broadcast from AP to clients like AM/FM radio. This is not how Wi-Fi works:
It’s a network. The AP and clients connected to it must all be able to transmit and receive to and from each other, more like this:
Because they are all operating on the same channel, each client or AP must wait for the others to stop transmitting before it can transmit. It works just like Walkie Talkie radios. Only one radio can transmit at a time, everyone else must listen and wait. Additionally, they all need to be close enough to hear each other so that they do not transmit overtop of each other, causing interference that corrupts the communications. The channel they are using is what’s called a shared medium.
If they can’t all hear each other, they will transmit overtop of each other which results in corrupted frames (not packets, Wi-Fi operates at layer 2) that must be retransmitted. The bigger the cell, the worse this problem becomes (the hidden node problem). So when you crank up the transmit power of an AP to increase its coverage, you exacerbate this problem, because the AP is now serving clients that are further apart from one and other.
In many networks, the majority of Wi-Fi clients are smartphones with low-power radios and meager antennas. They already have difficulty hearing other clients further away in the cell. For networks like this, performance can be greatly improved by lowering the transmit power of the AP rather than increasing it.
Further, because the channel is a shared-medium, it has limited capacity. There is only so much available capacity to transmit in a single channel. Faster clients can transmit, well faster, and therefore use less of that capacity, known as airtime. Older or cheaper clients that are slower use more airtime to transmit the same amount of data. It doesn’t matter what vendor’s name is on the access point, airtime is airtime. Once a channel is saturated, that’s it. You can’t add more clients to it without leading to degraded performance. You can’t alter the laws of physics. At this point you need to add another AP to utilize the capacity of a different channel, or replace slow clients with faster ones.
Regardless, it’s worthwhile to intuitively understand the nature of Wi-Fi networks, so that these common pitfalls can be avoided. Many other Wi-Fi best practices that I haven’t outlined here stem from this foundational knowledge. Based on this, can you think of other things that might affect Wi-Fi performance?
This is a simplification of 802.11 operation meant to give those new to the subject a casual understanding of how it works. Sometimes 802.11 frames are broadcast, one-way-only, from the AP to all clients in the network. Some management frames and broadcast frames from the wired network are broadcast this way. The important point to remember is that this is the exception, not the rule, and if all clients cannot hear each other, there is still the possibility that this broadcast traffic could be corrupted by another client transmitting over it.
Many WLAN’s administrators purchase commercial SSL certificates for their RADIUS server to use for PEAP 802.1X authentication. The advantage of this approach is that a cert from a common commercial CA is likely to have its root CA cert already installed on all the clients accessing the network. Although many clients will still prompt the user to trust the server’s cert, they won’t warn them that the certificate is invalid.
While many WLAN’s are configured this way, it’s become increasingly
easy to deploy EAP-TLS, which offers greater security that PEAP. Windows clients, Macs, iOS clients, and now Chromebooks can all automatically request and install a client cert from Windows Server Active Directory Certificate Services (ADCS), making its deployment much simpler than in the past.
Some organizations might desire to enable EAP-TLS for company-owned clients while preserving PEAP for BYOD clients that don’t benefit from the automatic certificate deployment that a managed, company-owned client does. They’d like to keep their commercial cert to use to authenticate PEAP clients, but also deploy a private CA to issue client certs for EAP-TLS authentication.
With Windows Server NPS as a radius server, this is simple to setup. The same has not been true for FreeRADIUS, until version 3 was released. With FreeRADIUS 3.0.x one can specify a unique TLS configuration for each tunneled EAP method. This eap.conf snippet shows how that can be done.
enable = no
override_cert_url = yes
url = "http://127.0.0.1/ocsp/"
tls = tls-common
tls = tls-peap
default_eap_type = mschapv2
copy_request_to_tunnel = no
use_tunneled_reply = no
virtual_server = "inner-tunnel"
send_error = no
With FreeRADIUS 2, it was not possible to configure multiple tls-config’s. Some admins were able to make it work by creating a combined cert with both the private CA cert and commercial CA cert and using that in the EAP-TLS ca_file. That’s a very bad idea as it then allows anyone with a cert from the commercial CA to authenticate to your network!
With FreeRADIUS 3, you can specify unique TLS parameters for each EAP method.
To follow-up my last post where I expressed concern about marking cellular carrier Wi-Fi calls with the proper QoS class, I’m please to see that Cisco will include application signatures for Wi-Fi Calling in it’s upcoming AVC Protocol Pack 15 update. Other vendors should follow suit.
Keep in mind that changing the classification of VoWiFi packets on the WLAN only affects downstream packets from an AP. Upstream is up to the client.
Do Wi-Fi Calling smartphones mark upstream VoWiFi packets for the WMM AC_VO queue? If so, that could pose a problem in high density networks, as a large group of these clients will demand immediate airtime and limit other clients’ access to the medium. Imagine a future where Apple, AT&T, and Verizon all support WiFi Calling and enable it by default to off-load data from their LTE networks. This could happen as early as 2016. High density networks that were designed for best effort data suddenly have to deal with these demanding clients who can dominate the 802.11 contention window. Wireless engineers that haven’t handled voice on the WLAN in the past will now be forced to deal with it.
The first thing to consider is making WMM Admission Control (WMM-AC) mandatory for voice to prevent voice clients from dominating a channel’s contention window. I suggest doing this before all the major cellular carriers enable Wi-Fi calling and these clients show up en masse on your WLAN. To date, the Wi-Fi Alliance has certified 77 smartphones for WMM-AC. The elephant in the enterprise room is the iPhone, which lacks WFA certification for WMM-AC, although it may still support it. I suspect that most newer clients that support WMM probably are WMM-AC capable as well, but that is just a hunch. A client that doesn’t support WMM-AC just won’t gain access to the AC_VO queue, but it can still pass voice traffic without higher priority.
Wireless engineers may also choose to tweak the default WMM AC_VO AIFSN and contention window min/max settings to give these packets less airtime priority. Given today’s PHY rates, that may not cause a significant impact on the performance of these applications when channel utilization is low to moderate.
The goal will be to strike a balance between voice performance without significantly degrading the performance of best-effort data clients. WLAN’s that were designed with voice in mind will have an advantage as they provide higher minimum SNR and therefore higher minimum PHY rates, as well as better roaming characteristics. If your WLAN doesn’t provide fast roaming now, expect it to be a requirement in the future. (Queue the lack of client support for 802.11r rant, with a hat tip to Apple)
What other approaches are out there for dealing with a sudden increase in voice clients?
Yesterday, AT&T enabled Wi-Fi calling for iPhones on its network. AT&T is by far the largest carrier in the US to enable this feature, so expect to see an increase in Wi-Fi calling on your WLAN soon. Twitter user @wirelessguru posted this packet capture, which shows an iPhone with service from AT&T sending Wi-Fi voice packets with WMM AC_VO QoS markings (and some odd layer 3 markings as well).
Several WLAN vendors offer layer 7, or application layer, firewalls and quality of service tools. The feature has different names depending on the vendor (Application Visibility and Control, Layer 7 Visibility, AppRF, etc.), but they all try to do the same thing. These tools work at the application layer to identify packets for processing through firewall or QoS rules, which is very useful in today’s world where so many applications are served over the Internet on ports 80 and 443. Traditional stateful firewalls aren’t much use when you want to say, ratelimit Netflix traffic.
At first, you may be tempted to identify all the applications commonly used on your WLAN and assign each of them to a QoS queue. Mission-critical applications get higher priority while social networks and video streaming services are deprioritized. Mark everything!
However, like other features of enterprise gear, while it’s tempting to turn it on and go nuts, you should use restraint, and here’s why: Layer 7 traffic analysis can be very CPU intensive, so the more layer 7 rules in your ACL’s, the more work the AP or controller must do to enforce them. That can result in a performance penalty during high traffic periods. At least one WLAN vendor tacitly acknowledges this by providing an undocumented “Turbo Mode” that will “disable QoS policies and improve Wi-Fi performance.”
Also keep in mind that layer 7 traffic analysis is a bit more of an art than the hard science of stateful packet inspection. Traffic flows are compared to vendor proprietary signatures for proper identification, and that’s not always 100% reliable. An application update or backend infrastructure change may require the development of a new signature for proper identification. WLAN vendors need to provide customers with regular updates to their application signature databases to ensure proper identification is occurring.
With that aside, what are some good uses of layer 7 firewalls and QoS?
Background Data Hogs
RF is a shared medium and as such it is often a bottleneck in busy networks. Software update utilities that run in the background on client machines can be problematic when there are a lot of stations sharing a channel. These applications like to all run at the same time, triggered by events like shifting from a 4G connection to Wi-Fi or right after a machine boots up.
In a school environment, this could happen during first period when everyone pulls out their Chromebook and they all automatically check for updates in the background, while at the same time students’ iPhones notice the Wi-Fi connection and decide now is the time to download that massive iOS update. The WLAN can slow to a crawl without any end-user interaction other than walking in the door.
I think this is where layer 7 QoS shines. By marking Apple Software Update and Chrome OS update packets for the background queue (AC_BK), for example, other applications that users are interacting with in the foreground of their clients take priority on the network. Of course, you will customize these rules to your IT environment. A Microsoft shop will want to do this with traffic to their WSUS server, etc. If you have a lot of iOS clients, iCloud traffic is one to look out for. Dropbox might be a big one too. You may want to consider deprioritizing antivirus updates as well, as these applications sometimes update quite frequently in the background.
Chrome and Chrome OS Updates
Incidentally, despite the overwhelming popularity of Chrome OS in K12, I am unaware of any vendor that provides application signatures for Chrome OS updates. If you can define custom applications within your WLAN (I know that Aerohive and Meraki can do this), use these URL’s to identify Chrome OS updates (these also cover Chrome web browser autoupdates for Windows/Mac/Linux):
Or, if you are really strapped for throughput, use firewall rules to block these applications altogether on the guest network, for example. If WAN throughput is really limited you may need to consider end-to-end QoS all the way to your WAN circuit. Most enterprise WLAN gear can translate WMM QoS markings to 802.1p or DiffServ markings on the ethernet network, but remember to configure QoS on every networking device between the AP and WAN. Do packet captures to confirm your configuration is working.
Is it standardized testing season and you are worried that students’ use of Pandora and Netflix is affecting your WLAN performance? No need to go to superiors or committees and ask to have them blocked. That’s a bit draconian anyway. Deprioritize those applications with layer 7 QoS rules.
Malicious and Illegal Applications
Stop bad traffic at the AP or controller before it gets to your content filter. This provides an extra layer of filtering and reduces the traffic the content filter must process. If you don’t enforce station isolation, it can also can block some LAN attacks that would otherwise not reach your content filter. At school, peer-to-peer file sharing applications like Bittorrent, proxy applications, Tor, and shady VPN services are all good candidates layer 7 firewall blocking. Just make sure your firewall rules comply with organizational policy.
RF design is the most important factor in meeting the needs of voice-over-Wi-Fi applications, and properly configuring QoS for the enterprise VoIP system has always been important as well. But now we’re seeing users making VoWiFi calls via their cellular carrier. Layer 7 traffic analysis can be used to identify this new traffic and push it to the proper WMM queue (AC_VO).
Going forward these tools might prove less effective as more and more network traffic is encrypted by default. In fact, all HTTP/2 traffic will be encrypted. The companies that develop the application signatures used by WLAN vendors have a challenge to do more with less. Our dependence on these products is increasing while at the same time it will become more difficult to identify application traffic on the network.
When deploying a WLAN it’s easy to fall into the trap of enabling features you might not need, just because, well, you paid for them and they are cool. Often times a KISS approach results in better performance, but hey, look at this cool new thing it can do it!
Load balancing is one of those features. While it seems harmless enough, there are some scenarios that can get you into trouble.
For the uninitiated, WLAN load balancing is a feature that encourages clients to associate with the least-loaded nearby AP. Typically, a client will attempt to associate with the loudest nearby AP, without regard to how many clients are already associated (most AP’s don’t share that information anyway, but some do by using the BSS Load element within management frames). Most load balancing algorithms work by suppressing probe and association responses from heavily loaded AP’s so that a client either won’t know that it is there, or it will fail to associate with it. Hopefully the client will then attempt to associate with a different AP that has more capacity available to clients.
There are a couple problems with this to keep in mind. The most important problem is one the affects all clients. The AP has a very different view of the RF environment than the client does. What a highly sensitive, enterprise grade AP is capable of hearing is quite different from what a low-cost, consumer-grade Wi-Fi chipset can, and they are of course not listening from the same location either. It gets worse if that client radio is part of a smartphone tucked into a pocket or purse. In this example, the AP may think it’s safe to ignore probe and association requests from that client because it’s aware of three other nearby AP’s that are less-loaded, but the reality is that the smartphone can’t hear any AP but the one that is ignoring it.
And not all load balancing algorithms do what you think they do. Some operate by simply limiting the total number of associated clients an AP radio will accept, even some that are described as “airtime-based” from my experience. The problem here is that this doesn’t take into account the actual airtime utilization, the truest measure of the load on an AP radio. Often, the airtime utilization is quite low when an algorithm decides the AP is too loaded and should push clients elsewhere. Say 30 clients are associated to one of the AP’s radios. If they are all idle, there is still plenty of capacity for others to associate as well, as very little airtime is being used. Make sure you know exactly how your WLAN’s load balancing works. Test it to make sure it does what it claims it does, and set your limits high.
Here are some examples where load balancing can causes problems:
A high school classroom fills with students. As they enter the room, their smartphones, which were already configured to join the WLAN, automatically roam to the loudest nearby AP. The teacher asks the students to get out their laptops as part of her lesson. The laptops now try to connect, but the nearby AP already has 30 smartphones associated to it, so it ignores the probe and association requests from the laptops. The best case scenario is that the laptops are able to associate with another nearby AP, albeit at a lower data rate than the louder AP. The worst case scenario is that client’s Wi-Fi radio drivers won’t budge, and continually fail to associate with the loudest AP (which is ignoring them), or the neighboring AP that the loaded-AP is trying to push new clients to is actually too distant for the new clients to hear. But the smartphones are all nearly idle, so it would have been better for the laptops to associate with the louder AP.
The school media center is used to store several carts of iPads. The iPads are not powered-down before being stored, so they all associate with the media center AP. Visitors to the media center have difficulty connecting to the network in the media center, because the algorithm believes it is heavily loaded and ignores requests to associate to the media center AP. The media center AP can hear another AP well, but most visiting clients in the media center cannot. The visiting clients cannot connect to the WLAN, yet in this case as well, the AP is actually not loaded at all. The iPads are completely idle and using almost no airtime.
Here is the where load balancing makes sense:
In areas where you can reasonably anticipate that a single AP radio may become overloaded, such as a cafeteria, gym, or performance space.
In areas where multiple AP’s are very close to one and other and create tightly overlapping coverage cells. This helps mitigate the problem of clients and AP’s having a differing view of the RF.
Nowhere else. Only use load-balancing when both of the above criteria are met.