Chrome OS Wi-Fi Diagnostics

chromebook-logo

In the K-12 market Chromebooks are the most common devices used in 1:1 programs. If you are designing high density Wi-Fi networks for Chromebook 1:1 programs, it helps to know how to access their Wi-Fi statistics, logs, and networking tools. This knowledge is valuable for troubleshooting day-to-day Chromebook Wi-Fi issues as well.

The Basics

Despite its simplicity, Chrome OS, the Linux variant that Chromebooks run, does have some useful diagnostics tools that can help troubleshoot Wi-Fi problems. Most of these tools are included in the crosh shell, which you can open by typing Control-Alt-T. Here are some of my go-to crosh networking commands that don’t require an explanation.

ping
route
tracepath

 

This command provides some good Wi-Fi stats like retries, MCS index, and also RoamThreshold, which is the SNR at which this Chromebook will attempt to roam to a new BSS. Hopefully, one day we’ll be able to modify this value on enterprise-managed Chromebooks through the Google Apps admin console.

crosh> connectivity show devices

/device/wlan0
  Address: 485ab6######
  BgscanMethod: simple
  BgscanShortInterval: 30
  BgscanSignalThreshold: -50
  ForceWakeToScanTimer: false
  IPConfigs/0: /ipconfig/wlan0_0_dhcp
  Interface: wlan0
  LinkMonitorResponseTime: 3
  LinkStatistics/0/AverageReceiveSignalDbm: -61
  LinkStatistics/1/InactiveTimeMilliseconds: 8002
  LinkStatistics/2/LastReceiveSignalDbm: -62
  LinkStatistics/3/PacketReceiveSuccesses: 63919
  LinkStatistics/4/PacketTransmitFailures: 25
  LinkStatistics/5/PacketTrasmitSuccesses: 34432
  LinkStatistics/6/TransmitBitrate: 52.0 MBit/s MCS 11
  LinkStatistics/7/TransmitRetries: 60969
  Name: wlan0
  NetDetectScanPeriodSeconds: 120
  Powered: true
  ReceiveByteCount: 1610461765
  RoamThreshold: 18
  ScanInterval: 60
  Scanning: false
  SelectedService: /service/5
  TransmitByteCount: 133127986
  Type: wifi
  WakeOnWiFiFeaturesEnabled: not_supported
  WakeToScanPeriodSeconds: 900

 

This command is very useful in troubleshooting 802.1X issues. It shows more layer 2 details on all the BSS’s that have been discovered. In this case, /service/12 is an 802.1X network that the Chromebook is associated with, and /service/15 an open network also in range.

crosh> connectivity show services

/service/12
  AutoConnect: true
  CheckPortal: auto
  Connectable: true
  ConnectionId: 2069398120
  Country: US
  DNSAutoFallback: false
  Device: /device/wlan0
  EAP.AnonymousIdentity: anonymous
  EAP.CACert: 
  EAP.CACertID: 
  EAP.CACertNSS: 
  EAP.CertID: 
  EAP.ClientCert: 
  EAP.EAP: PEAP
  EAP.Identity: <username>
  EAP.InnerEAP: auth=MSCHAPV2
  EAP.KeyID: 
  EAP.KeyMgmt: WPA-EAP
  EAP.PIN: 
  EAP.PrivateKey: 
  EAP.RemoteCertification/0: /OU=Domain Control Validated/CN=<cn>
  EAP.RemoteCertification/1: /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
  EAP.RemoteCertification/2: /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./CN=Go Daddy Root Certificate Authority - G2
  EAP.RemoteCertification/3: /C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification Authority
  EAP.SubjectMatch: 
  EAP.UseProactiveKeyCaching: false
  EAP.UseSystemCAs: true
  Error: Unknown
  ErrorDetails: 
  GUID: 5137BA48-0424-41B0-B5DE-29A427084925
  HTTPProxyPort: 34599
  IPConfig: /ipconfig/wlan0_1_dhcp
  IsActive: true
  LinkMonitorDisable: false
  ManagedCredentials: false
  Mode: managed
  Name: <SSID name>
  PassphraseRequired: false
  PortalDetectionFailedPhase: 
  PortalDetectionFailedStatus: 
  PreviousError: 
  PreviousErrorSerialNumber: 0
  Priority: 0
  PriorityWithinTechnology: 0
  Profile: /profile/chronos/shill
  ProxyConfig: 
  SaveCredentials: true
  SavedIP.Address: 192.168.1.20
  SavedIP.Gateway: 192.168.1.1
  SavedIP.Mtu: 0
  SavedIP.NameServers: 192.168.1.1
  SavedIP.PeerAddress: 
  SavedIP.Prefixlen: 26
  SavedIPConfig/0/Address: 192.168.1.20
  SavedIPConfig/1/Gateway: 192.168.1.1
  SavedIPConfig/2/Mtu: 0
  SavedIPConfig/3/NameServers/0: 192.168.1.1
  SavedIPConfig/4/PeerAddress: 
  SavedIPConfig/5/Prefixlen: 26
  Security: 802_1x
  SecurityClass: 802_1x
  State: online
  Strength: 35
  Tethering: NotDetected
  Type: wifi
  UIData: 
  Visible: true
  WiFi.AuthMode: 
  WiFi.BSSID: 00:11:74:##:##:##
  WiFi.Frequency: 5240
  WiFi.FrequencyList/0: 2412
  WiFi.FrequencyList/1: 2462
  WiFi.FrequencyList/2: 5240
  WiFi.FrequencyList/3: 5320
  WiFi.HexSSID: ########
  WiFi.HiddenSSID: false
  WiFi.PhyMode: 7
  WiFi.PreferredDevice: 
  WiFi.ProtectedManagementFrameRequired: false
  WiFi.RoamThreshold: 0
  WiFi.VendorInformation/0/OUIList: 00-03-7f

/service/15
  AutoConnect: false
  CheckPortal: auto
  Connectable: true
  ConnectionId: 0
  Country: US
  DNSAutoFallback: false
  Device: /device/wlan0
  EAP.AnonymousIdentity: 
  EAP.CACert: 
  EAP.CACertID: 
  EAP.CACertNSS: 
  EAP.CertID: 
  EAP.ClientCert: 
  EAP.EAP: 
  EAP.Identity: 
  EAP.InnerEAP: 
  EAP.KeyID: 
  EAP.KeyMgmt: NONE
  EAP.PIN: 
  EAP.PrivateKey: 
  EAP.SubjectMatch: 
  EAP.UseProactiveKeyCaching: false
  EAP.UseSystemCAs: true
  Error: Unknown
  ErrorDetails: 
  GUID: 
  HTTPProxyPort: 0
  IsActive: false
  LinkMonitorDisable: false
  ManagedCredentials: false
  Mode: managed
  Name: <SSID name>
  PassphraseRequired: false
  PortalDetectionFailedPhase: 
  PortalDetectionFailedStatus: 
  PreviousError: 
  PreviousErrorSerialNumber: 0
  Priority: 0
  PriorityWithinTechnology: 0
  Profile: 
  ProxyConfig: 
  SaveCredentials: true
  Security: none
  SecurityClass: none
  State: idle
  Strength: 44
  Tethering: NotDetected
  Type: wifi
  UIData: 
  Visible: true
  WiFi.AuthMode: 
  WiFi.BSSID: 7c:69:f6:##:##:##
  WiFi.Frequency: 5320
  WiFi.FrequencyList/0: 5240
  WiFi.FrequencyList/1: 5320
  WiFi.HexSSID: ##########
  WiFi.HiddenSSID: false
  WiFi.PhyMode: 7
  WiFi.PreferredDevice: 
  WiFi.ProtectedManagementFrameRequired: false
  WiFi.RoamThreshold: 0
  WiFi.VendorInformation/0/OUIList: 00-10-18

 

This command brings up a lot of valuable information including a dump of the latest full channel scan and the Wi-Fi chipset’s capabilities, among other useful data.

crosh> network_diag --wifi

iw dev wlan0 survey dump:
Survey data from wlan0
 frequency: 2412 MHz
 noise: -92 dBm
 channel active time: 63 ms
 channel busy time: 49 ms
 channel receive time: 45 ms
 channel transmit time: 0 ms
Survey data from wlan0
 frequency: 2417 MHz
 noise: -93 dBm
 channel active time: 62 ms
 channel busy time: 47 ms
 channel receive time: 41 ms
 channel transmit time: 0 ms
Survey data from wlan0
 frequency: 2422 MHz
 noise: -92 dBm
 channel active time: 63 ms
 channel busy time: 4 ms
 channel receive time: 0 ms
 channel transmit time: 0 ms

[truncated]

Survey data from wlan0
 frequency: 5220 MHz
 noise: -94 dBm
 channel active time: 124 ms
 channel busy time: 0 ms
 channel receive time: 0 ms
 channel transmit time: 0 ms
Survey data from wlan0
 frequency: 5240 MHz [in use]
 noise: -94 dBm
 channel active time: 15723 ms
 channel busy time: 513 ms
 channel receive time: 185 ms
 channel transmit time: 3 ms
Survey data from wlan0
 frequency: 5260 MHz
 noise: -94 dBm
 channel active time: 85031 ms
 channel busy time: 84907 ms
 channel receive time: 84907 ms
 channel transmit time: 84907 ms

[truncated]

iw dev wlan0 station dump:
Station 00:11:74:##:##:## (on wlan0)
 inactive time: 5444 ms
 rx bytes: 11797197
 rx packets: 38419
 tx bytes: 1703260
 tx packets: 9779
 tx retries: 14295
 tx failed: 43
 signal: -58 dBm
 signal avg: -60 dBm
 tx bitrate: 24.0 MBit/s
 rx bitrate: 300.0 MBit/s MCS 15 40MHz short GI
 authorized: yes
 authenticated: yes
 preamble: long
 WMM/WME: yes
 MFP: no
 TDLS peer: no
iw dev wlan0 scan dump:
BSS 00:11:74:##:##:##(on wlan0) -- associated
 TSF: 61418055#### usec (7d, 02:36:20)
 freq: 5240
 beacon interval: 100 TUs
 capability: ESS Privacy SpectrumMgmt ShortSlotTime (0x0511)
 signal: -60.00 dBm
 last seen: 847370 ms ago
 Information elements from Probe Response frame:
 Supported rates: 24.0* 36.0 48.0 54.0 
 DS Parameter set: channel 48
 Country: US Environment: Indoor/Outdoor
 Channels [36 - 36] @ 24 dBm
 Channels [40 - 40] @ 24 dBm
 Channels [44 - 44] @ 24 dBm
 Channels [48 - 48] @ 24 dBm
 Channels [52 - 52] @ 23 dBm
 Channels [56 - 56] @ 23 dBm
 Channels [60 - 60] @ 23 dBm
 Channels [64 - 64] @ 23 dBm
 Channels [100 - 100] @ 24 dBm
 Channels [104 - 104] @ 24 dBm
 Channels [108 - 108] @ 24 dBm
 Channels [112 - 112] @ 24 dBm
 Channels [116 - 116] @ 24 dBm
 Channels [120 - 120] @ 24 dBm
 Channels [124 - 124] @ 24 dBm
 Channels [128 - 128] @ 24 dBm
 Channels [132 - 132] @ 24 dBm
 Channels [136 - 136] @ 24 dBm
 Channels [140 - 140] @ 24 dBm
 Channels [144 - 144] @ 24 dBm
 Channels [149 - 149] @ 30 dBm
 Channels [153 - 153] @ 30 dBm
 Channels [157 - 157] @ 30 dBm
 Channels [161 - 161] @ 30 dBm
 Channels [165 - 165] @ 30 dBm
 Power constraint: 3 dB
 BSS Load:
 * station count: 2
 * channel utilisation: 4/255
 * available admission capacity: 31250 [*32us]
 HT capabilities:
 Capabilities: 0x9ef
 RX LDPC
 HT20/HT40
 SM Power Save disabled
 RX HT20 SGI
 RX HT40 SGI
 TX STBC
 RX STBC 1-stream
 Max AMSDU length: 7935 bytes
 No DSSS/CCK HT40
 Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
 Minimum RX AMPDU time spacing: 8 usec (0x06)
 HT TX/RX MCS rate indexes supported: 0-15
 HT operation:
 * primary channel: 48
 * secondary channel offset: below
 * STA channel width: any
 * RIFS: 1
 * HT protection: no
 * non-GF present: 1
 * OBSS non-GF present: 0
 * dual beacon: 0
 * dual CTS protection: 0
 * STBC beacon: 0
 * L-SIG TXOP Prot: 0
 * PCO active: 0
 * PCO phase: 0
 VHT capabilities:
 VHT Capabilities (0x338001b2):
 Max MPDU length: 11454
 Supported Channel Width: neither 160 nor 80+80
 RX LDPC
 short GI (80 MHz)
 TX STBC
 RX antenna pattern consistency
 TX antenna pattern consistency
 VHT RX MCS set:
 1 streams: MCS 0-9
 2 streams: MCS 0-9
 3 streams: not supported
 4 streams: not supported
 5 streams: not supported
 6 streams: not supported
 7 streams: not supported
 8 streams: not supported
 VHT RX highest supported: 0 Mbps
 VHT TX MCS set:
 1 streams: MCS 0-9
 2 streams: MCS 0-9
 3 streams: not supported
 4 streams: not supported
 5 streams: not supported
 6 streams: not supported
 7 streams: not supported
 8 streams: not supported
 VHT TX highest supported: 0 Mbps
 VHT operation:
 * channel width: 1 (80 MHz)
 * center freq segment 1: 42
 * center freq segment 2: 0
 * VHT basic MCS set: 0xfffc
 WMM: * Parameter version 1
 * u-APSD
 * BE: CW 15-1023, AIFSN 3
 * BK: CW 15-1023, AIFSN 7
 * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
 * VO: CW 3-7, AIFSN 2, TXOP 1504 usec
 RSN: * Version: 1
 * Group cipher: CCMP
 * Pairwise ciphers: CCMP
 * Authentication suites: IEEE 802.1X FT/IEEE 802.1X
 * Capabilities: PreAuth 1-PTKSA-RC 1-GTKSA-RC MFP-capable (0x0081)
 * 0 PMKIDs
 * Group mgmt cipher suite: AES-128-CMAC
iw dev wlan0 link:
Connected to 00:11:74:##:##:## (on wlan0)
 freq: 5240
 RX: 11797197 bytes (38419 packets)
 TX: 1703260 bytes (9779 packets)
 signal: -58 dBm
 tx bitrate: 24.0 MBit/s

 bss flags: short-slot-time
 dtim period: 1
 beacon int: 100

That’s a lot more Wi-Fi data than most other platforms make natively accessible.

Additionally, to view most of this data without crosh, use this internal Chrome URL. Just enter it into the address bar and hit enter.

chrome://system/

Areas of interest for Wi-Fi data:

  • network-devices – same output as the “connectivity show devices” crosh command
  • network-services – same output as the “connectivity show services” crosh command
  • wifi_status – same output as the “network_diag –wifi” crosh command
  • lspci – you can see the Wi-Fi chipset hardware here (more on that later)
  • network_event_log
  • netlog

Viewing Logs

You can start logging Wi-Fi events using this crosh command.

crosh> network_logging wifi

Old flimflam tags: []
Current flimflam tags: [device+inet+manager+service+wifi]

method return sender=:1.1 -> dest=:1.146 reply_serial=2
Old wpa level: info
Current wpa level: msgdump

View the resulting device event logs at this internal Chrome URL: chrome://device-log/

Run this command to view the kernel log, which includes a lot of Wi-Fi events. I wish there was a –follow option, but currently there is not.

crosh> dmesg

A restart will return the Chromebook to normal logging levels.

And if you really want to bury yourself in logs, go to chrome://net-internals/#chromeos, click Wi-Fi to enable debugging on that interface, let the “capturing events” count creep up while you perform a task, then click “Store debug logs” to save a debug-logs_<date>.tgz archive in your Downloads folder. Be warned, the signal to noise ratio is very low with this approach. Google provides a log analyzer that you can upload these files to, but I’ve never had the need to go that far down the road. This is best used if you need to submit logs to the Google Apps Enterprise Support Team or a hardware manufacturer.

Advanced Wi-Fi Analysis with Developer Mode

But wait, there’s more! If you can put a Chromebook into Developer Mode, you can run packet captures and break into the Linux bash shell. Most enterprise-managed Chromebooks will have this mode disabled for obvious reasons, but it’s easy enough to move your test Chromebook into a test OU and disable this and other restrictions for testing purposes. (That’s IT testing, not high-stakes student testing! Make sure your OU’s clearly differentiate the two.)

Packet Capture

First, determine which channel’s frequency you’d like to run the capture, and also if channel bonding is in use. The internal URL from above will work for this as well as the “network_diag –wifi” crosh command. The frequency of the currently associated BSS is displayed at the end of that output here.

…
iw dev wlan0 link:
Connected to 00:11:74:##:##:## (on wlan0)
 freq: 5240
 RX: 11797197 bytes (38419 packets)
 TX: 1703260 bytes (9779 packets)
 signal: -58 dBm
 tx bitrate: 24.0 MBit/s

 bss flags: short-slot-time
 dtim period: 1
 beacon int: 100
Screenshot 2016-05-09 at 2.37.00 PM
Disable the Wi-Fi NIC here.

Now turn off the Wi-Fi NIC in the GUI so it can be put into monitor mode.

You can now run the packet capture using the crosh command below.

Optionally, specify a secondary channel above or below the primary if you are doing a 40 MHz 802.11n capture by appending the “–ht-location <above|below>” flag.

 

crosh> packet_capture --frequency <frequency in MHz>

Capturing from phy0_mon.  Press Ctrl-C to stop.
^CCapture stored in /home/chronos/user/Downloads/packet_capture_7K08.pcap

You’ll get a pcap file complete with Radiotap headers if the hardware supports it saved in the Downloads folder which you can send to another machine to do analysis. If the Chromebook is all you have available, you can upload the pcap to CloudShark for analysis.

Wi-Fi Troubleshooting in Bash

Once you’ve got Developer Mode enabled, you can use the bash shell and follow the network log (or any other log) as things happen. This is my preferred way to troubleshoot Chromebook Wi-Fi issues in real time.

crosh> shell
chronos@localhost / $ tail -f /var/log/net.log

Now go do something to the Wi-Fi connection and watch the log scroll by.

A few Linux networking commands you may already know are available here as well like ifconfig, arp, and netstat.

Wi-Fi Chipset and Driver Information

While you’re in the bash shell, you can also determine the Wi-Fi chipset hardware in use. The output of this lspci command will only show the Wi-Fi adapter and the driver it is using. The basic output of lspci is included in chrome://system, but this method allows you to get more data. Add a -v flag or two to see even more.

crosh> shell
chronos@localhost /sys $ sudo lspci -nnk | grep -A2 0280

01:00.0 Network controller [0280]: Qualcomm Atheros AR9462 Wireless Network Adapter [168c:0034] (rev 01)
        Subsystem: Foxconn International, Inc. Device [105b:e058]
        Kernel driver in use: ath9k

This Acer C720 Chromebook has a Qualcomm Atheros AR9462 and uses the ath9k driver.

Run this command to discover the Wi-Fi chipset driver version. This is helpful if you want to know if the Wi-Fi chipset drivers were updated during a system update.

crosh> shell
chronos@localhost / $ sudo ethtool -i wlan0

driver: ath9k
version: 
firmware-version: 
bus-info: 0000:01:00.0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

In this case no version number is reported, perhaps because the OS is using a generic Atheros driver that is packaged with the Linux kernel.

Below is the output of the same commands on an HP Chromebook 11 G4 running Chrome OS 41. This machine has an Intel Wireless-AC 7260 chipset and the driver and firmware-version are listed.

crosh> shell
chronos@localhost / $ sudo lspci -nnk | grep -A2 0280

01:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev c3)
        Subsystem: Intel Corporation Dual Band Wireless-AC 7260 [8086:c070]
        Kernel driver in use: iwlwifi
crosh> shell
chronos@localhost / $ sudo ethtool -i wlan0

driver: iwlwifi
version: 3.10.18
firmware-version: 23.14.10.0
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

The driver version appears to just be the Linux kernel version. The firmware-version is the chipset driver version.

Interestingly, after updating this HP Chromebook to Chrome OS 50, the Wi-Fi chipset firmware-version changed… but went down.

crosh> shell
chronos@localhost / $ sudo ethtool -i wlan0

driver: iwlwifi
version: 3.10.18
firmware-version: 16.229726.0
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

An inspection of the iwlwifi version history shows that this driver is actually newer than the previous version. Before version 16 it was the third number in the version that indicated what major branch it came from, so version 23.14.10.0 was actually from the version 10 branch. Thankfully, that’s cleared up in newer versions of the driver so that the first number is the version branch.

It’s good to see that Google includes Wi-Fi chipset driver updates with Chrome OS updates. This is especially nice as system updates are downloaded and installed automatically to Chromebooks. Personally, I’ve seen system updates resolve odd Chromebook Wi-Fi problems and it’s possible the newer drivers are the solution.

Advertisements

Making RRM Work

There’s been a lot of good discussion within the Wi-Fi community recently about the viability of radio resource management (RRM), or the automatic selection of channels and Tx power settings by proprietary vendor algorithms. At Mobility Field Day 1 there was this excellent roundtable.

Personally, I usually fall into the static design camp, for many of the same reasons as others. I don’t want RRM to change the carefully tuned design I put in place and create an unpredictable RF environment, I’ve seen RRM do some very peculiar things like put adjacent AP’s on the same channels or crank up the Tx power of 2.4 GHz radios in an HD environment, RRM doesn’t disable 2.4 GHz radios when CCC is present, and it doesn’t plan DFS channels properly. Still, I’ve tried to keep an open mind.

Static designs have their limitations too. Statically designed WLAN’s can’t react to new neighboring networks contending for the same airtime, or new sources of RF interference that weren’t there when the static design was developed. It’s a real benefit of RRM that it does automatically correct for these problems.

Let me propose a hybrid approach that uses static design to handle the things that RRM does poorly, while still allowing RRM to react to the changing RF environment.

Static Design Elements

  • Tx power levels should be statically assigned. Once finely tuned as part of the design process, why would they ever need to change?
  • Excess 2.4 GHz radios in high density environments should be manually disabled because RRM simply won’t do this.
  • DFS channels should be statically planned. RRM can clump DFS channels near one and other, resulting in a 5 GHz dead zone for clients without DFS support. Also, because of these clients, DFS channels should only be used when non-DFS channels are all already deployed. Therefore, statically plan DFS channels when needed in areas where non-DFS channels create secondary coverage, and let RRM dynamically plan the other bands. It’s less likely to have a neighbor or transient hotspot appear in the DFS bands anyway.
  • Set channel channel bandwidth statically. The design process includes considering the capacity requirements of the WLAN to determine the appropropriate 5 GHz channel bandwidth. RRM algorithms don’t know what your capacity requirements are. 2.4 GHz should always be 20 MHz.

Things Left to RRM

  • 2.4 GHz channel planning, once excess radios are disabled. Channels 1, 6, and 11 only, of course.
  • 5 GHz channel planning, once DFS channels are statically assigned.
  • That’s all.

The benefit of this approach is that it addresses many of the shortcomings of RRM while still retaining its main benefit: the WLAN can dynamically react to RF interference and transient neighbors by moving affected AP radios to clear spectrum. The things that RRM can’t do or does poorly are simply removed from its control.

Even within these constraints, there are still some vendor’s RRM algorithms I trust more than others. And even those I trust enough to try this with, I’d still want to monitor regularly to make sure the WLAN hasn’t turned into the RRM trainwreck the I’ve seen all too often when RRM is given free reign.

Why K12 Schools Need Wi-Fi Design

Chalk drawing of WIFI

Enterprise Wi-Fi is expensive, very expensive. For schools with limited budgets and a responsibility to be good stewards of tax dollars, it is important to get it right, without spending more than necessary on the initial deployment, ongoing support, or fixing costly mistakes. Any savings can be used in other ways to improve education, so unnecessary spending on Wi-Fi can have an impact on the quality of education in schools.

That’s why it is critical for schools to work with Wi-Fi professionals to develop a sound design for the network before it is purchased and deployed. Fixing mistakes after the fact costs a lot of money. The usual “fix” of installing extra access points in areas where performance is poor can often make the situation worse, when the real solution might be to remove an AP or correct a bad channel plan.

What often happens is this: A vendor talks the school into purchasing one AP per classroom and then the channel planning is left up to auto-channel algorithms (known as RRM, or radio resource management). This is a very simple and seemingly easy way to get Wi-Fi in schools that doesn’t involve the headaches of procuring CAD drawings, performing multiple site surveys, collecting client device data, and other things that delay the installation of the Wi-Fi network and increase the up-front costs.

Don’t do it!

The big problem here is that this is extremely inefficient. Do schools need one AP per classroom? Some do, some don’t. You’ll only find out by doing a proper network design. Maybe the design process reveals that a school only needs one AP per two classrooms. A school like this that doesn’t bother with a design and just does one AP per classroom has spent 100% more money than it needed to.

Capacity issues aside, what about channel planning and radio transmit power control?Nearby AP’s on the same channel interfere with each other. Vendors love to tout their RRM as effective means to automatically set these controls optimally. Just turn it on and let the magic happen.

The truth is, RRM just can’t be trusted. It may work for a while, and then it changes something and it doesn’t. My experience has shown that RRM is fine for simple networks with few neighbors, but in the high density, busy RF environment of K12 schools it often fails miserably. Neighboring AP’s end up on the same channel resulting in interference with one and other. Transmit power goes up and down unpredictably. Your Wi-Fi network is an unpredictable moving target. What you measured and validated at one location one day is different the next day, and so on. The ongoing cost of supporting a network in this state is much higher than one that began with a proper design.

While some vendors’ RRM is better than others, no vendor is immune to this. A better solution is a proper design where channels and transmit power are determined by a Wi-Fi professional who is informed by years of experience and site survey data that RRM algorithms can’t factor into their decision making.

It is critical that schools include a proper Wi-Fi design in their Wi-Fi deployments to save tax dollars that would better be spent on other educational needs, and prevent many future headaches that result from over/under capacity networks and bumbling RRM algorithms. The Wi-Fi design process avoids these issues, and leaves schools with efficient, stable networks and the confidence in knowing that the network was validated against their needs, with the data to prove it.

Beyond the tax dollars, in a 21st century classroom, what is the true cost of poor Wi-Fi?

 

This is How Wi-Fi Actually Works

I decided to write this blog because there appears to be a very common misunderstanding about how Wi-Fi works among end-users and even many network administrators as well. Instead of repeating myself, I can share this link with folks that need a little lesson in 802.11 operation.

Wi-Fi is does not work like AM/FM broadcast radio.

Well, in some ways it does, Wi-Fi radios transmit and receive radio frequency energy (RF) just like AM/FM stations do, but it’s operation is much more complex. If you are stuck in the AM/FM radio analogy, you’ll make several mistakes with Wi-Fi, such as:

  • Coverage is considered, not capacity. Again, if Wi-Fi were a one-way radio broadcast like AM/FM radio, you’d only need to provide a strong “Wi-Fi signal” for everything to work well. This leads you down this next path.
  • The “Wi-Fi signal” (using this term might be a tell that the person speaking is stuck in the AM/FM radio analogy) is too low, so crank up the AP’s transmit power to make it louder.
  • Every problem is thought of as an infrastructure problem, client radios are not considered when troubleshooting.
  • Getting hung up on the vendor’s name that is on the access point, without considering what is much more crucial, the overall design that went into the network.

How Wi-Fi Actually Works

Wi-Fi is not a one-way broadcast from AP to clients like AM/FM radio. This is not how Wi-Fi works:

badfi
Nope. Not like this.

 

It’s a network. The AP and clients connected to it must all be able to transmit and receive to and from each other, more like this:

goodfi
Note that while the intended destination of a transmitted frame is usually just one other radio, real RF transmissions radiate in all directions, and are heard by all clients.

 

Because they are all operating on the same channel, each client or AP must wait for the others to stop transmitting before it can transmit. It works just like Walkie Talkie radios. Only one radio can transmit at a time, everyone else must listen and wait. Additionally, they all need to be close enough to hear each other so that they do not transmit overtop of each other, causing interference that corrupts the communications. The channel they are using is what’s called a shared medium.

If they can’t all hear each other, they will transmit overtop of each other which results in corrupted frames (not packets, Wi-Fi operates at layer 2) that must be retransmitted. The bigger the cell, the worse this problem becomes (the hidden node problem). So when you crank up the transmit power of an AP to increase its coverage, you exacerbate this problem, because the AP is now serving clients that are further apart from one and other.

In many networks, the majority of Wi-Fi clients are smartphones with low-power radios and meager antennas. They already have difficulty hearing other clients further away in the cell. For networks like this, performance can be greatly improved by lowering the transmit power of the AP rather than increasing it.

Further, because the channel is a shared-medium, it has limited capacity. There is only so much available capacity to transmit in a single channel. Faster clients can transmit, well faster, and therefore use less of that capacity, known as airtime. Older or cheaper clients that are slower use more airtime to transmit the same amount of data. It doesn’t matter what vendor’s name is on the access point, airtime is airtime. Once a channel is saturated, that’s it. You can’t add more clients to it without leading to degraded performance. You can’t alter the laws of physics. At this point you need to add another AP to utilize the capacity of a different channel, or replace slow clients with faster ones.

Regardless, it’s worthwhile to intuitively understand the nature of Wi-Fi networks, so that these common pitfalls can be avoided. Many other Wi-Fi best practices that I haven’t outlined here stem from this foundational knowledge. Based on this, can you think of other things that might affect Wi-Fi performance?

Footnote

This is a simplification of 802.11 operation meant to give those new to the subject a casual understanding of how it works. Sometimes 802.11 frames are broadcast, one-way-only, from the AP to all clients in the network. Some management frames and broadcast frames from the wired network are broadcast this way. The important point to remember is that this is the exception, not the rule, and if all clients cannot hear each other, there is still the possibility that this broadcast traffic could be corrupted by another client transmitting over it.

Wi-Fi Load Balancing Considerations

When deploying a WLAN it’s easy to fall into the trap of enabling features you might not need, just because, well, you paid for them and they are cool. Often times a KISS approach results in better performance, but hey, look at this cool new thing it can do it!

Load balancing is one of those features. While it seems harmless enough, there are some scenarios that can get you into trouble.

For the uninitiated, WLAN load balancing is a feature that encourages clients to associate with the least-loaded nearby AP. Typically, a client will attempt to associate with the loudest nearby AP, without regard to how many clients are already associated (most AP’s don’t share that information anyway, but some do by using the BSS Load element within management frames). Most load balancing algorithms work by suppressing probe and association responses from heavily loaded AP’s so that a client either won’t know that it is there, or it will fail to associate with it. Hopefully the client will then attempt to associate with a different AP that has more capacity available to clients.

There are a couple problems with this to keep in mind. The most important problem is one the affects all clients. The AP has a very different view of the RF environment than the client does. What a highly sensitive, enterprise grade AP is capable of hearing is quite different from what a low-cost, consumer-grade Wi-Fi chipset can, and they are of course not listening from the same location either. It gets worse if that client radio is part of a smartphone tucked into a pocket or purse. In this example, the AP may think it’s safe to ignore probe and association requests from that client because it’s aware of three other nearby AP’s that are less-loaded, but the reality is that the smartphone can’t hear any AP but the one that is ignoring it.

And not all load balancing algorithms do what you think they do.  Some operate by simply limiting the total number of associated clients an AP radio will accept, even some that are described as “airtime-based” from my experience. The problem here is that this doesn’t take into account the actual airtime utilization, the truest measure of the load on an AP radio. Often, the airtime utilization is quite low when an algorithm decides the AP is too loaded and should push clients elsewhere. Say 30 clients are associated to one of the AP’s radios. If they are all idle, there is still plenty of capacity for others to associate as well, as very little airtime is being used. Make sure you know exactly how your WLAN’s load balancing works. Test it to make sure it does what it claims it does, and set your limits high.

Here are some examples where load balancing can causes problems:

  • A high school classroom fills with students.  As they enter the room, their smartphones, which were already configured to join the WLAN, automatically roam to the loudest nearby AP. The teacher asks the students to get out their laptops as part of her lesson. The laptops now try to connect, but the nearby AP already has 30 smartphones associated to it, so it ignores the probe and association requests from the laptops. The best case scenario is that the laptops are able to associate with another nearby AP, albeit at a lower data rate than the louder AP. The worst case scenario is that client’s Wi-Fi radio drivers won’t budge, and continually fail to associate with the loudest AP (which is ignoring them), or the neighboring AP that the loaded-AP is trying to push new clients to is actually too distant for the new clients to hear. But the smartphones are all nearly idle, so it would have been better for the laptops to associate with the louder AP.
  • The school media center is used to store several carts of iPads. The iPads are not powered-down before being stored, so they all associate with the media center AP. Visitors to the media center have difficulty connecting to the network in the media center, because the algorithm believes it is heavily loaded and ignores requests to associate to the media center AP. The media center AP can hear another AP well, but most visiting clients in the media center cannot. The visiting clients cannot connect to the WLAN, yet in this case as well, the AP is actually not loaded at all. The iPads are completely idle and using almost no airtime.

Here is the where load balancing makes sense:

  • In areas where you can reasonably anticipate that a single AP radio may become overloaded, such as a cafeteria, gym, or performance space.
  • In areas where multiple AP’s are very close to one and other and create tightly overlapping coverage cells. This helps mitigate the problem of  clients and AP’s having a differing view of the RF.
  • Nowhere else. Only use load-balancing when both of the above criteria are met.

Yes, Hotspot 2.0 is the Future of Secure Guest Wi-Fi

Since first blogging about Hotspot 2.0 and its application to typical enterprise WLAN guest networks I’ve learned quite a bit more thanks to several helpful tweets from Dave Wright of Ruckus Wireless. Although the large majority of the focus of Hotspot 2.0 still seems to be on integration with cellular carriers for AAA services and all the complexity and exclusivity that entails, there are provisions for simpler, anonymous, and secure Hotspot 2.0 guest networks that are much closer to what the typical enterprise WLAN operator will actually deploy. As I’ve said before, authentication is not a priority for most WLAN operators on their guest network, but encryption certainly is.

Is it the holy grail of guest Wi-Fi? Maybe, but more on that after we look at the Wi-Fi Alliance Passpoint (Release 2) Deployment Guidelines. In all 61 pages of the document, there are these few paragraphs devoted to what I predict to be the most common use of Hotspot 2.0.

12. Free Public Hotspot 2.0-Based Hotspots 

Hotspot Operators may provide Hotspot 2.0-based free, public, hotspot service. In this particular service, Hotspot Operators have the need to ensure hotspot users have accepted the terms and conditions governing their hotspot’s use, but are not interested in knowing (or do not wish to know/track) any particular user’s identity. This functionality is provided by Hotspot 2.0 Release 2 infrastructure. The Hotspot Operator configures their infrastructure as follows:

  1. The user in a Free Public Hotspot initiates the online sign-up registration process with the Free Public Hotspot’s OSU server.
  2. During the registration exchange, the OSU server presents the terms and conditions to the user.
  3. If the user accepts the terms and conditions, the OSU server issues a credential; if the user refuses, no credential is provisioned. Note that the same credential is issued to all users which have accepted the terms and conditions; therefore, the Hotspot Operator cannot track the identity of an individual user during the Hotspot 2.0 Access state (see section 6).
  4. When the user/mobile device returns to the same Free Public Hotspot, the previously provisioned credentials are used to provide secure, automatic access. The mobile device authenticates using EAP-TTLS, which provides for the generation of unique cryptographic keying material even though users share a common password.

If the terms and conditions change, then the user is taken through a subscription remediation
process during which the new terms and conditions are presented. If the user accepts the
changed terms and conditions, then a new credential is provisioned. 

There you have it. Hotspot 2.0 does provide for anonymous and secure guest networks. In short, 802.1X/EAP authentication is accomplished with EAP-TTLS through a common credential that is issued after the signup process. In fact, this has already been deployed by the cities of San Jose and San Francisco. To get an idea of how it works from a user’s perspective, check out the directions here.

Yes, you can do this all without Hotspot 2.0 in a less elegant way: Add a notice to your guest network captive portal that users can login to the secure network with a specific generic credential, and even a link to download a .mobileconfig profile for iOS and Mac OS users. However, the user experience won’t be standardized like it is with an OSU server, and non-Apple users will have to manually configure a connection to the 802.1X network, including adding a cert to their trusted roots. Not good UX. And definitely not fast, free, and easy.

The bad news: With Hotspot 2.0, the guest network captive portal is here to stay.

The good news: Users only have to wrestle with the captive portal once (unless the client credential is changed). And perhaps the technology behind the portal is more mobile client-friendly than today’s captive portals. Hopefully a HS2 client sees the OSU server being advertised by ANQP and immediately presents a notification to the user. If the user doesn’t play ball, the client should disconnect and the SSID should not be saved as a preferred SSID.

The great news: This is a lower-friction way to get secure Wi-Fi to guests.

Is this the holy grail? That depends on what you think that is. To me, the barrier to entry is low enough that I think this is a win for guest Wi-Fi.

Another wrinkle: The Hotspot 2.0 802.1X network can still be configured to automatically connect guests from known realms. That means that you could add eduroam and the coming anyroam realms to the SSID to onboard users from those participating organizations securely and automatically. And yes, no captive web portal either. So if the opportunities to integrate with AAA clearinghouses grow (exist at all?), the number of users subjected to the captive portal shrinks.

I’m sure there are concerns regarding the possibility of new SSID’s. Luckily, a legacy open guest network can serve Hotspot 2.0 incompatible clients while also delivering the Online Sign Up portal to compatible clients. That means no new SSID’s.

For the visual learners among us, your typical enterprise WLAN might look like this now:

A typical enterprise WLAN
A typical enterprise WLAN

To support secure Hotspot 2.0 guest clients, it might look like this in the future:

A Hotspot 2.0-enabled enterprise WLAN
A Hotspot 2.0-enabled enterprise WLAN

I’m looking forward to seeing gear get updated to support Hotspot 2.0 Rev 2 so we can see this in the wild. Ruckus is doing a great job banging the drum for Hotspot 2.0, but other vendors seem to be further behind. Client support is not great (come on, Android), but Apple has supported it since iOS 7, so here’s hoping that will drive others to follow suit.

K-12 Needs eduroam Too

As eduroam sweeps across higher education in the United States, I think it’s worth considering its place in K-12 as well. After all, every university and college that joins eduroam is within the boundaries of a K-12 school district, and a longstanding relationship is likely to exist between those institutions.

Where I work, we have Miami University within our district boundaries. Miami has a highly regarded education program and dozens of Miami students student teach at our schools everyday. Miami faculty and staff send their kids to our schools, they volunteer here, and they regularly attend school functions. We maintain a formal partnership with the university.

Our teachers and staff take classes at Miami, teach classes, send their kids there, and attend events at Miami. Some of our high school students attend post-secondary classes there as well. In other words, there is a regular flow of visitors back and forth between the institutions.

Because Miami is an eduroam member, it made perfect sense for us to join too. Visitors from our district could gain automatic and secure Wi-Fi access at Miami, and visitors from Miami could have the same at our district. That’s why I pushed to deploy eduroam at my district, making us the first K-12 institution to join eduroam in the United States.

But what about K-12 schools that don’t have higher education institutions nearby? I think there is still a case to be made for eduroam for these districts too.

Quoting the eduroam US FAQ:

Our institution already has great guest Wi-Fi, why do I need eduroam?

eduroam is not a replacement to your guest network, it is a complement to make your guest network and your community compatible with other eduroam participants.

Enabling eduroam on your campus provides four main features:

  1.  it allows your campus to welcome eduroam enabled visitors in a strongly authenticated way (the strong authentication also provides a way to authorize users to different resources)

  2. it allows your own users to travel to eduroam enabled locations around the world (some places only have eduroam as a guest Wi-Fi)

  3. it saves provisioning time to your institution and to the visitors since authentication is automatic and access is immediate

  4. it improves security since your visitors use a standard protocol (WPA2-enterprise, 802.1X) that encrypts traffic between their devices and the Wi-Fi infrastructure

Doesn’t all of this apply to K-12 as well?

I think it does. K-12 school districts aren’t isolated islands. Teachers attend professional development at neighboring districts, students travel for field trips and athletic events, teachers and leadership attend meetings at local education service centers. Some even share employees who split their time between multiple districts. There is a lot of educational roaming occurring within today’s K-12 community.