Splunking Wi-Fi DFS Events

splunk-logo

One aspect of wireless networking that I’ve always struggled with is visibility into DFS events. Usually I catch them by chance by noticing two nearby AP’s on a site map using the same non-DFS channel, or maybe by casually looking through logs, but I’ve never felt like I had the reporting and alerting that should be in place for DFS events, because they can be very disruptive. An AP will abruptly change the channel it is operating on, and if it switches back, it may observe a “quiet period” of 60 seconds in which is does not transmit any data. Not good.

Enter Splunk.

Splunk is a powerful log analysis tool that you can think of as “Google for the data center.” It takes log data from almost any source and makes it as searchable as Google has made the web. For wireless network engineers, you can quickly and easily search syslog and SNMP data, build reports, and create alerts. Splunk Light is free and will process up to 5 GB of data a day, which should be plenty for most WLAN’s. It also runs easily on macOS if you just want to demo it locally.

Using Splunk I very quickly created this dashboard of real DFS data from SNMP traps coming from a Cisco WLC. It’s a little rough around the edges still (I need to figure out how to clean-up those AP names and channels), but it still shows me a lot of the valuable data.

splunk-dfs-report
Yes, DFS is a problem at this site.

I can easily create email alerts too, so that if a DFS event occurs an email is triggered, or if say 10 DFS events occur within 30 minutes an email is triggered.

How To

I installed Splunk on a Mac then setup the built-in snmptrapd to listen for incoming traps and log them to a file. For snmptrapd to interpret the SNMP traps from a Cisco WLC, download the Cisco MIB’s and copy them to /usr/share/snmp/mibs/. Then you can start snmptrapd.

Here’s the CLI one-liner to do that:

sudo snmptrapd -Lf /var/log/snmp-traps --disableAuthorization=yes -m +ALL

Next configure the WLC to send SNMP traps to the Splunk box by adding its IP address under Management -> SNMP -> Trap Receivers. While you’re there go to Trap Controls and turn everything on you want to analyze.

wlc-snmp

Even though DFS events only generate SNMP traps, it’s still a good idea to send syslog messages to Splunk too, so do that under Management -> Logs -> Config. Set the Syslog Level to “Informational” to get a lot of good data. “Debugging” is probably way too much. The Syslog Facility isn’t important.

wlc-syslog

Monitor the file snmptrapd is writing traps to to make sure it is working. Run this command on the Mac and you should see traps streaming in. If not you have some troubleshooting to do.

tail -f /var/log/snmp-traps

Now add the file to Splunk under Data -> Data inputs -> Files & directories, and you should be able to see the traps in searches.

Have a look at Splunk’s documentation on SNMP data for more setup help. Setting up syslog is easier. Under Data -> Data inputs -> UDP add UDP port 514 with the Source type “syslog.”

Once the data is coming into Splunk you can start searching it and creating fields. Search “RADIO_RC_DFS” (with quotes) to see all the DFS traps. From that search click “Extract new fields” and select the tab delimiter to parse the data. Give the AP name field a label, and then you can create visualizations of DFS events by AP name. Any search can also be used to trigger an alert, such as an email.

Cisco has published a WLC SNMP Trap Guide as well as a WLC syslog Guide that is helpful when working with this data. Find the messages you are looking for in those guides, then search for them in Splunk.

From there it’s all up to your own creativity. DFS events is just scratching the surface of Splunk’s potential. You can look at authentication events, monitor RRM, and there might be some interesting roaming analysis that can be done with this data as well. I’m sure there are some bright engineers out there that have taken this a lot farther. Please share your work!

Use Let’s Encrypt Certificates with FreeRADIUS

lets_encrypt

Let’s Encrypt is a certificate authority that generates TLS certificates automatically, and for free. It’s been great for web server administrators because it allows them to automate the process of requesting, receiving, installing, and renewing TLS certificates, taking the administrative overhead out of setting up a secure website. And did I mention it’s free and supported by all the major web browsers now?

Getting all of that to work with a RADIUS server is challenging however, mostly because of the way Let’s Encrypt works. The Let’s Encrypt client runs on a web server with a public domain name. The client requests a TLS cert from Let’s Encrypt and before Let’s Encrypt issues the cert, it verifies that the client is connecting from the same domain name that it is requesting a cert for, and that the client can put some hidden files on the server’s website. Do you see the problem? Unless you run a public-facing web server on your RADIUS server (unlikely), Let’s Encrypt will not issue certs to your server. It needs a web server it can interact with in order to validate the domain name of the client’s request.

Why use a certificate from a public CA like Let’s Encrypt for 802.1X/PEAP authentication? While a private CA offers more security, a public CA has the advantage of having a pre-installed root certificate on virtually all RADIUS supplicants, including BYOD clients that are unmanaged. If you don’t have an MDM or BYOD onboarding solution, you can’t get your private root cert onto BYOD clients very easily.

Unmanaged clients are a security risk, however, because the end-user can easily override security warnings that occur when connecting to an evil twin network with a bogus cert. A good MDM solution will allow network admins configure BYOD clients properly so that TLS failures cannot be bypassed.

A few considerations before you get too excited:

  • Again, a better, more secure solution is to use a private CA and distribute the RADIUS server cert to clients using an MDM solution and/or BYOD onboarding solution.
  • Let’s Encrypt certs are only good for three months at a time, and some supplicants will prompt users to accept the new certificate when it is renewed.
  • Build in some error handling, logging, and notification. E.g. an email from the web server when the cert renewal routine runs, including its output, and an email from the RADIUS server when it copies the new certs and reloads FreeRADIUS.
  • It works as root, but there’s probably a way to accomplish this without using root. Do it that way.
  • You can accomplish the same thing with Windows servers and Powershell.
  • You broke it, not me.

To get this working, we need a public web server with the same domain same as you’d use in your RADIUS server’s cert common name. This means internal domain names with a .local TLD won’t work.

I setup two Ubuntu servers, one running the nginx web server with a public IP, and another on my local network running FreeRADIUS. The web server will run the Let’s Encrypt client and create and renew the certs. The RADIUS server will copy those certs from the web server and use them for PEAP authentication. Once setup, the process of renewing and installing the certs on the RADIUS server happens automatically, just like it would on a web server.

First, a public DNS A record needs to be setup with the domain name which will be used on the TLS cert common name, we’ll use radius1.example.com, and point it to the IP address of the web server.

Once that is done, you can install and run the Let’s Encrypt client on the web server. It works with Apache too, but if you prefer nginx like me, follow these directions to get it setup with Ubuntu 14.04 or Ubuntu 16.04. Don’t skip over the part about using cron to run the renewal routine.

Now that we have the certs on the web server, we’ll turn our attention to the RADIUS server. The first thing we need to do is setup ssh public key authentication between the two servers. I used the root account on both servers to do this, so that I would have permissions everywhere I needed it. With public key authentication in place securely copying the certs in the future can happen automatically, without getting stopped by a password request. Here are instructions to get that working.

Now we’ll start configuring FreeRADIUS on the RADIUS server. I’m assuming you already have a working FreeRADIUS server. I’m using FreeRADIUS 3, and you should be too. I like to use a separate directory for the Let’s Encrypt certs.

root@freeradius:~# mkdir /etc/freeradius/certs/letsencrypt/

Now let’s try copying the certs from the web server to this directory on the RADIUS server. If public key authentication is working, you should not be prompted for a password.

root@freeradius:~# scp root@radius1.example.com:/etc/letsencrypt/live/radius1.example.com/fullchain.pem /etc/freeradius/certs/letsencrypt/
root@freeradius:~# scp root@radius1.example.com:/etc/letsencrypt/live/radius1.example.com/privkey.pem /etc/freeradius/certs/letsencrypt/

Did it work? If so, you should see the certs in the new folder we created.

root@freeradius:~# ls /etc/freeradius/certs/letsencrypt/
fullchain.pem  privkey.pem

Now we need to configure FreeRADIUS to use the Let’s Encrypt certs for PEAP authentication. I have a previous blog about using different CA’s for PEAP and EAP-TLS on FreeRADIUS that should come in handy here. If you are using EAP-TLS too, be sure not to change that CA from your private CA! All we need to do now is modify /etc/freeradius/mods-enabled/eap with our new certs in the TLS section used for PEAP.

root@freeradius:~# nano /etc/freeradius/mods-enabled/eap

tls-config tls-peap should be changed to:

…
tls-config tls-peap {
 private_key_file = ${certdir}/letsencrypt/privkey.pem
 certificate_file = ${certdir}/letsencrypt/fullchain.pem
…

If you aren’t using multiple TLS configurations, this section is named tls-config tls-common. You can leave it like that.

Reload FreeRADIUS for the change to take effect.

root@freeradius:~# service freeradius reload
 * Checking FreeRADIUS daemon configuration...               [ OK ] 
 * FreeRADIUS daemon is running
 * Reloading FreeRADIUS daemon freeradius                    [ OK ]

Now when connecting to the WLAN that is configured to use this RADIUS server for 802.1X/PEAP  authentication, the client is presented with a valid Let’s Encrypt server certificate.

mac_cert_challenge

OK, we have a working FreeRADIUS server using Let’s Encrypt certs for 802.1X/PEAP authentication. Now let’s automate the process of getting renewed certs from the web server to the RADIUS server. We’ll use scp and cron to get this done.

On the RADIUS server, add these commands to root’s crontab, with the appropriate domain names.

root@freeradius:~# crontab -e
# m h dom mon dow command
0 3 * * 1 scp root@radius1.example.com:/etc/letsencrypt/live/radius1.example.com/fullchain.pem /etc/freeradius/certs/letsencrypt/
0 3 * * 1 scp root@radius1.example.com:/etc/letsencrypt/live/radius1.example.com/privkey.pem /etc/freeradius/certs/letsencrypt/
5 3 * * 1 service freeradius reload

At 3:00 AM every Monday, cron will run copy the TLS certs from the web server the reload FreeRADIUS at 3:05 AM to put them into production. Now the Let’s Encrypt certs are automatically installed on the RADIUS server a few minutes after they are renewed on the web server. The certs are good for three months at a time and renewable one month in advance, so you’ll get renewed certs automatically installed every two months.

Presto! You now have Let’s Encrypt certs automatically renewed and installed on your RADIUS server. While a private CA is a better solution for 802.1X authentication, this isn’t bad for a $0 software stack.

Clear To Send Podcast Episode 62: K12 Wi-Fi Deployments

podcast_logoI recently had the pleasure of joining Rowell Dionicio on the Clear to Send Podcast to talk about Wi-Fi in K12 schools. Clear To Send is a great podcast about enterprise wireless networking and a great way to stay current with the Wi-Fi community.

We talked about K12 requirements, challenges, funding, my design process, security, and everyone’s favorite K12 subject, 1 AP per classroom!

After listening to the podcast, I thought about some other K12 Wi-Fi considerations that I didn’t bring up on the air.

  • K12 often has requirements for mDNS applications like Apple AirPlay for AppleTV or Google Cast for Chromecast. This is a challenge in an enterprise network because mDNS does not cross layer 2 boundaries. It’s important to consider that when designing a new WLAN and selecting the vendor. Many WLAN vendors do have features that can assist with relaying mDNS traffic between vlans. Be careful to limit this traffic to only the vlans where it is required.
  • Excessive multicast traffic can be a burden on channel utilization when it is not controlled. Many WLAN vendors have features that intelligently filter broadcast/multicast traffic, instead of always forwarding it out the AP radio interfaces at the lowest data rate. If you are dealing with mDNS or large subnets (common in K12) it’s worthwhile to understand how the WLAN can manage broadcast/multicast traffic.
  • MSP’s are a great way to get well-designed enterprise Wi-Fi into small to medium size schools that don’t have the internal resources to handle it themselves. MSP’s can be hired to support and operate the WLAN after installing it, which gives them an incentive that VAR’s who just sell the hardware might not have–to design the WLAN properly. E-Rate funding is now available to reimburse schools for managed services contracts with MSP’s.
  • eduroam is available for K12 schools, not just higher education. Check it out!
  • It’s hard to listen to the sound of your own voice.

I really enjoyed talking Wi-Fi with Rowell and I’d love to return to the podcast in the future. Maybe we can talk about healthcare Wi-Fi next? Thanks Rowell!

Have a listen here: CTS 062: K12 Wi-Fi Deployments – Clear To Send

802.11ac Encryption Upgrade

encryption

The security features provided by the IEEE 802.11 standard haven’t changed much since the 802.11i amendment was ratified in 2004, which is more commonly known by its Wi-Fi Alliance certification name WPA2. 802.11w protected management frames were introduced in 2009, but it is only recently that Wi-Fi chipsets for client devices have included support for it. WPA2 introduced the robust CCMP encryption protocol as a replacement for the compromised WEP-based encryption schemes of the past. CCMP utilizes stronger 128 bit AES encryption keys. As a general rule of thumb, if you aren’t using CCMP on a Wi-Fi network designed for security, you’re doing it wrong. It’s been out for a long time and older protocols have well-established weaknesses.

11acHowever, there are some new encryption changes in the 802.11ac amendment which have mostly flown under the radar. Besides 256 QAM, wider channels, and MU-MIMO, 802.11ac now includes support for 256 bit AES keys and the GCMP encryption protocol. Galois Counter Mode Protocol is a more efficient and performance-friendly encryption protocol than CCMP.

A few interesting nuggets from section 11.4 of the 802.11ac amendment:

The AES algorithm is defined in FIPS PUB 197-2001. All AES processing used within CCMP uses AES with either a 128-bit key (CCMP-128) or a 256-bit key (CCMP-256).

And…

CCMP-128 processing expands the original MPDU size by 16 octets, 8 octets for the CCMP Header field and 8 octets for the MIC field. CCMP-256 processing expands the original MPDU size by 24 octets, 8 octets for the CCMP Header field, and 16 octets for the MIC field.

By the way, you can download the 802.11ac amendment or the entire 802.11-2012 standard from the IEEE here for free. For more on these security changes read sections 8.4.2.27 and 11.4 of the 802.11ac amendment.

It seems odd that these changes were included in the 802.11ac amendment, and not in a separate security-focussed amendment like 802.11w and 802.11i. Nothing wrong with it, just unexpected. I’m curious to see if the 802.11ax amendment includes security changes as well.

Why the addition of 256 bit AES keys? It could have something to do with a few chinks in the armor of 128 bit AES keys. The current attacks appear to be impractical, but future attacks that take advantage of quantum computing may put 128 bit AES keys at risk. NIST thinks that larger key sizes are needed to defend symmetric AES keys like those used in WPA2 against quantum computer attacks, which they say will be operational within the next 20 years. I’ll take their word for it.

Because the amendment only specifies CCMP-128 as mandatory for RSN compliance, it’s very unlikely that we’ll see CCMP-256/GCMP-256 in use anytime soon. Further, enabling 256 bit cipher suites effectively disables support for all non-802.11ac clients as well as 802.11ac clients that only support the mandatory cipher suites (most of them?). That’s because CCMP-256 and GCMP-256 pairwise keys are only compatible with 256 bit group keys, breaking backwards compatibility with legacy clients. There are also a lot of 802.11n clients out there that aren’t going away anytime soon, so actually deploying CCMP-256/GCMP-256 will require a separate CCMP-256/GCMP-256-only SSID. Excited yet?

Further, I can’t find any documentation that suggests that infrastructure vendors have implemented CCMP-256/GCMP-256 at all, just a few slide decks here and there with an overview of the changes. These cipher suites appear to be optional, so I wonder if any VHT clients or AP’s actually support them today, and when they will in the future. The Linux Wi-Fi configuration API cfg80211 and driver framework mac80211 have added software support for it. That’s about all the implementation I have found. Perhaps PCS compliance or Wi-Fi Alliance certification will eventually force the issue, or perhaps it will go the way of 802.11n Tx beamforming and never be implemented. There are a lot obstacles to overcome before 256 bit keys become practical.

However, a VHT client can negotiate a GCMP-128 RSNA within a BSS that uses a backwards-compatible CCMP-128 group key, and the 802.11 standard does support multiple pairwise cipher suites within a BSS (remember TSN’s?). That allows the GCMP-128 pairwise cipher suite to be used alongside everyday CCMP-128 pairwise and group keys on real, production networks.

To tell if a BSS is using one of the new cipher suites in a packet capture, look at a beacon frame’s RSN information element. The cipher suite selector is always 00-0F-AC for the CCMP/GCMP encryption protocols, it’s the cipher suite type that distinguishes between the specific cipher suites. For example, 00-0F-AC:4 is the default CCMP-128, 00-0F-AC:9 indicates GCMP-256 and 00-0F-AC:10 indicates CCMP-256. Group keys for a BSS with protected management frames have their own suite type numbers. Look for multiple pairwise cipher suites to find support for the new stuff. Here’s the table of the new cipher suites. I’m on the lookout for 00-0F-AC:8 (GCMP-128), but I’ve yet to find a beacon frame with it advertised.

Table 8-99—Cipher suite selectors

OUI

Suite type  Meaning
00-0F-AC  4 CCMP-128 – default pairwise cipher suite and default group cipher suite for data frames in an RSNA
 00-0F-AC  6  BIP-CMAC-128—default group management cipher suite in an RSNA with management frame protection enabled
 00-0F-AC  8  GCMP-128 – default for a DMG STA
 00-0F-AC  9  GCMP-256
 00-0F-AC  10  CCMP-256
 00-0F-AC  11  BIP-GMAC-128
 00-0F-AC  12  BIP-GMAC-256
 00-0F-AC  13  BIP-CMAC-256

Interesting note that GCMP-128 is the default for a DMG STA, which is a directional multi-gigabit station defined in the 802.11ad amendment for operation in the 60 GHz band.

The standard limits the mixing of cipher suites so that the key sizes of the pairwise and group keys must match, and GCMP group keys can only be used with GCMP pairwise keys.

 

 

Hardening TLS for WLAN 802.1X Authentication

encryption_lockThis post outlines some configuration changes which can enhance the security of 802.1X EAP methods PEAP and EAP-TTLS, which use a temporary layer 2 TLS tunnel to protect a less secure inner authentication method. While EAP-TLS doesn’t create a full TLS tunnel, it does use a TLS handshake to provide keying material for the four-way handshake. It needs strong TLS too.

Standard 802.1X security best practices should also be implemented such as using strong passwords, disabling insecure EAP methods, disabling TKIP, proper supplicant configuration, deploying sha-2 certificates, and anonymous outer usernames. The focus here is the TLS tunnel exclusively.

Not all RADIUS servers can implement all of these suggestions, but some can certainly do more than others. My experience has been with Microsoft NPS and FreeRADIUS servers so that is what I’ll refer to when discussing specific implementations. I welcome input from Aruba ClearPass and Cisco ISE administrators on configuring those servers as well.

Why go through all the trouble? It turns out the same encryption techniques that are used by web clients and servers to protect data in HTTPS sessions are also used when EAP methods rely on a TLS encrypted session. Ask any web server admin, and they’ll tell you that not all HTTPS is created equally. The same vulnerabilities that web server admins deal with exist in TLS-assisted EAP methods used on the WLAN as well. There is a lot to be learned from the TLS best practices that are recommended for web server admins.

At the end of the day, the TLS session is all that stands between user credentials and would-be hackers. It needs careful consideration to verify that it is meeting current security standards.

Here’s what to do.

Disable SSL

We’re talking specifically about SSLv2 and SSLv3 here, not TLS, the collection of which is often referred to simply as “SSL.” SSLv2 and SSLv3 were cracked long ago.

Consider TLS Methods

TLS 1.2 is the most secure TLS method available, so why not disable TLS 1.0 and TLS 1.1? Right now supplicant support for TLS 1.1 and TLS 1.2 is far from universal, and TLS 1.0 with strong ciphers is still considered secure. Keep TLS 1.0 enabled for now.

Disable Weak Cipher Suites

Cipher suites are the specific encryption algorithms that are used in a TLS session. Supplicants and servers support a broad range of them, and some of them are better than others. Many RADIUS servers have older insecure cipher suites enabled by default. This allows old supplicants that do not support newer cipher suites to still function. Unless you have older supplicants, you can disable many of these cipher suites to enhance 802.1X security.

A current listing of strong cipher suites can be found at Cipherli.st. While the website focuses on web server configuration, TLS is TLS.

Be aware that EAP-TLS requires TLS_RSA_WITH_3DES_EDE_CBC_SHA.

Microsoft NPS

Microsoft NPS relies on Schannel to provide encryption for TLS-tunneled EAP methods. In order to control the protocols Schannel uses, an administrator must alter these registry keys. Note that changing these keys affects all TLS functionality on the server, so if you run IIS or RDS with TLS, these changes will affect those applications as well. Proceed with caution. The registry keys can be found in:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\Schannel\]

A full listing of cipher suites supported by Schannel can be found here.

If the prospect of manually editing dozens of registry keys on a Windows Server doesn’t appeal to you, the good people at Nartac Software have developed an application that allows these changes to be managed in a user-friendly GUI interface. IIS Crypto allows you to make all of the registry settings necessary for this, while also including some handy templates including Best Practices, PCI, FIPS 140-2, and Defaults.

Here is IIS Crypto displaying the default Schannel configuration of a Windows Server 2012 R2 server. There is a lot not to like here…

iis_crypto_defaults

And here is the Best Practices template. Note the obsolete protocols and cipher suites that are disabled, and the order in which cipher suites are prefered is updated as well.

iis_crypto_bp

Be aware that manually taking control of the Schannel TLS configuration means you’re in charge of it going forward. If Microsoft updates the default configuration, your manual config may still be in place. Stay up-to-date on new TLS vulnerabilities and periodically review your configuration for needed changes.

FreeRADIUS

FreeRADIUS 3 is the current supported stable release and you should be thinking about upgrading to it if you have not already. SSLv2 and SSLv3 are not supported by FreeRADIUS 3, only TLS 1.0, TLS 1.1, and TLS 1.2.

For FreeRADIUS to require stronger cipher suites, add this to the EAP-TLS configuration in the “eap” configuration file. Alternatively, specify a colon-separated list of specific cipher suites.

cipher_list = "HIGH"

Also be aware that  FreeRADIUS 2.2.6 and 3.0.7 and contain a critical bug that prevents successful TLS 1.2 sessions from starting. You should update these servers as soon as possible.

Harden Supplicants Too

Few 802.1X supplicants allow you to alter their TLS configuration. The best thing to do with supplicants is to routinely install system updates and retire clients that are EOL.

Documentation for the TLS capabilities of client supplicants is hard to come by. Microsoft published an update to Windows 7 and above to allow the use of TLS 1.1 and TLS 1.2 in its 802.1X supplicant, if configured manually for now. wpa_supplicant for Linux supports TLS 1.2 in version 2.0 and version 2.6 enabled it by default. TLS 1.2 is the default TLS version used in the supplicants for Windows 10Mac OS 10.11, iOS 9, and Android 6.0 (Update: It appears that Apple has deferred their decision to default to TLS 1.2 in iOS 9/ Mac OS 10.11 until a later release).

Lab it Up

To know definitively what a client supplicant is capable of, run a packet capture on TLS-tunneled EAP authentication and observe the TLS negotiation frames, or TLS handshake, that occur right after 802.11 association and EAP identity request/response frames.

The client will send a “Client Hello” frame in which Wireshark will mark as a TLS protocol frame. This frame includes the TLS version requested by the client along with its supported cipher suites. The TLS version is the highest version the client supports.

tls_client_hello

Next, the RADIUS server will respond with a “Server Hello” frame which specifies the TLS version and cipher suite to be used during the TLS session, and includes the server certificate as well. The server will choose the best cipher suite that both client and server support and the highest TLS version that both support as well.

tls_server_hello

A few more frames are exchanged to setup the TLS session, and then EAP authentication takes place within the encrypted TLS session. It’s these first two frames that are of most concern when documenting client TLS capabilities.

This is also a useful technique to use to verify that highly secure TLS encryption is occurring in production.

Chrome OS Wi-Fi Diagnostics

chromebook-logo

In the K-12 market Chromebooks are the most common devices used in 1:1 programs. If you are designing high density Wi-Fi networks for Chromebook 1:1 programs, it helps to know how to access their Wi-Fi statistics, logs, and networking tools. This knowledge is valuable for troubleshooting day-to-day Chromebook Wi-Fi issues as well.

The Basics

Despite its simplicity, Chrome OS, the Linux variant that Chromebooks run, does have some useful diagnostics tools that can help troubleshoot Wi-Fi problems. Most of these tools are included in the crosh shell, which you can open by typing Control-Alt-T. Here are some of my go-to crosh networking commands that don’t require an explanation.

ping
route
tracepath

 

This command provides some good Wi-Fi stats like retries, MCS index, and also RoamThreshold, which is the SNR at which this Chromebook will attempt to roam to a new BSS. Hopefully, one day we’ll be able to modify this value on enterprise-managed Chromebooks through the Google Apps admin console.

crosh> connectivity show devices

/device/wlan0
  Address: 485ab6######
  BgscanMethod: simple
  BgscanShortInterval: 30
  BgscanSignalThreshold: -50
  ForceWakeToScanTimer: false
  IPConfigs/0: /ipconfig/wlan0_0_dhcp
  Interface: wlan0
  LinkMonitorResponseTime: 3
  LinkStatistics/0/AverageReceiveSignalDbm: -61
  LinkStatistics/1/InactiveTimeMilliseconds: 8002
  LinkStatistics/2/LastReceiveSignalDbm: -62
  LinkStatistics/3/PacketReceiveSuccesses: 63919
  LinkStatistics/4/PacketTransmitFailures: 25
  LinkStatistics/5/PacketTrasmitSuccesses: 34432
  LinkStatistics/6/TransmitBitrate: 52.0 MBit/s MCS 11
  LinkStatistics/7/TransmitRetries: 60969
  Name: wlan0
  NetDetectScanPeriodSeconds: 120
  Powered: true
  ReceiveByteCount: 1610461765
  RoamThreshold: 18
  ScanInterval: 60
  Scanning: false
  SelectedService: /service/5
  TransmitByteCount: 133127986
  Type: wifi
  WakeOnWiFiFeaturesEnabled: not_supported
  WakeToScanPeriodSeconds: 900

 

This command is very useful in troubleshooting 802.1X issues. It shows more layer 2 details on all the BSS’s that have been discovered. In this case, /service/12 is an 802.1X network that the Chromebook is associated with, and /service/15 an open network also in range.

crosh> connectivity show services

/service/12
  AutoConnect: true
  CheckPortal: auto
  Connectable: true
  ConnectionId: 2069398120
  Country: US
  DNSAutoFallback: false
  Device: /device/wlan0
  EAP.AnonymousIdentity: anonymous
  EAP.CACert: 
  EAP.CACertID: 
  EAP.CACertNSS: 
  EAP.CertID: 
  EAP.ClientCert: 
  EAP.EAP: PEAP
  EAP.Identity: <username>
  EAP.InnerEAP: auth=MSCHAPV2
  EAP.KeyID: 
  EAP.KeyMgmt: WPA-EAP
  EAP.PIN: 
  EAP.PrivateKey: 
  EAP.RemoteCertification/0: /OU=Domain Control Validated/CN=<cn>
  EAP.RemoteCertification/1: /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
  EAP.RemoteCertification/2: /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./CN=Go Daddy Root Certificate Authority - G2
  EAP.RemoteCertification/3: /C=US/O=The Go Daddy Group, Inc./OU=Go Daddy Class 2 Certification Authority
  EAP.SubjectMatch: 
  EAP.UseProactiveKeyCaching: false
  EAP.UseSystemCAs: true
  Error: Unknown
  ErrorDetails: 
  GUID: 5137BA48-0424-41B0-B5DE-29A427084925
  HTTPProxyPort: 34599
  IPConfig: /ipconfig/wlan0_1_dhcp
  IsActive: true
  LinkMonitorDisable: false
  ManagedCredentials: false
  Mode: managed
  Name: <SSID name>
  PassphraseRequired: false
  PortalDetectionFailedPhase: 
  PortalDetectionFailedStatus: 
  PreviousError: 
  PreviousErrorSerialNumber: 0
  Priority: 0
  PriorityWithinTechnology: 0
  Profile: /profile/chronos/shill
  ProxyConfig: 
  SaveCredentials: true
  SavedIP.Address: 192.168.1.20
  SavedIP.Gateway: 192.168.1.1
  SavedIP.Mtu: 0
  SavedIP.NameServers: 192.168.1.1
  SavedIP.PeerAddress: 
  SavedIP.Prefixlen: 26
  SavedIPConfig/0/Address: 192.168.1.20
  SavedIPConfig/1/Gateway: 192.168.1.1
  SavedIPConfig/2/Mtu: 0
  SavedIPConfig/3/NameServers/0: 192.168.1.1
  SavedIPConfig/4/PeerAddress: 
  SavedIPConfig/5/Prefixlen: 26
  Security: 802_1x
  SecurityClass: 802_1x
  State: online
  Strength: 35
  Tethering: NotDetected
  Type: wifi
  UIData: 
  Visible: true
  WiFi.AuthMode: 
  WiFi.BSSID: 00:11:74:##:##:##
  WiFi.Frequency: 5240
  WiFi.FrequencyList/0: 2412
  WiFi.FrequencyList/1: 2462
  WiFi.FrequencyList/2: 5240
  WiFi.FrequencyList/3: 5320
  WiFi.HexSSID: ########
  WiFi.HiddenSSID: false
  WiFi.PhyMode: 7
  WiFi.PreferredDevice: 
  WiFi.ProtectedManagementFrameRequired: false
  WiFi.RoamThreshold: 0
  WiFi.VendorInformation/0/OUIList: 00-03-7f

/service/15
  AutoConnect: false
  CheckPortal: auto
  Connectable: true
  ConnectionId: 0
  Country: US
  DNSAutoFallback: false
  Device: /device/wlan0
  EAP.AnonymousIdentity: 
  EAP.CACert: 
  EAP.CACertID: 
  EAP.CACertNSS: 
  EAP.CertID: 
  EAP.ClientCert: 
  EAP.EAP: 
  EAP.Identity: 
  EAP.InnerEAP: 
  EAP.KeyID: 
  EAP.KeyMgmt: NONE
  EAP.PIN: 
  EAP.PrivateKey: 
  EAP.SubjectMatch: 
  EAP.UseProactiveKeyCaching: false
  EAP.UseSystemCAs: true
  Error: Unknown
  ErrorDetails: 
  GUID: 
  HTTPProxyPort: 0
  IsActive: false
  LinkMonitorDisable: false
  ManagedCredentials: false
  Mode: managed
  Name: <SSID name>
  PassphraseRequired: false
  PortalDetectionFailedPhase: 
  PortalDetectionFailedStatus: 
  PreviousError: 
  PreviousErrorSerialNumber: 0
  Priority: 0
  PriorityWithinTechnology: 0
  Profile: 
  ProxyConfig: 
  SaveCredentials: true
  Security: none
  SecurityClass: none
  State: idle
  Strength: 44
  Tethering: NotDetected
  Type: wifi
  UIData: 
  Visible: true
  WiFi.AuthMode: 
  WiFi.BSSID: 7c:69:f6:##:##:##
  WiFi.Frequency: 5320
  WiFi.FrequencyList/0: 5240
  WiFi.FrequencyList/1: 5320
  WiFi.HexSSID: ##########
  WiFi.HiddenSSID: false
  WiFi.PhyMode: 7
  WiFi.PreferredDevice: 
  WiFi.ProtectedManagementFrameRequired: false
  WiFi.RoamThreshold: 0
  WiFi.VendorInformation/0/OUIList: 00-10-18

 

This command brings up a lot of valuable information including a dump of the latest full channel scan and the Wi-Fi chipset’s capabilities, among other useful data.

crosh> network_diag --wifi

iw dev wlan0 survey dump:
Survey data from wlan0
 frequency: 2412 MHz
 noise: -92 dBm
 channel active time: 63 ms
 channel busy time: 49 ms
 channel receive time: 45 ms
 channel transmit time: 0 ms
Survey data from wlan0
 frequency: 2417 MHz
 noise: -93 dBm
 channel active time: 62 ms
 channel busy time: 47 ms
 channel receive time: 41 ms
 channel transmit time: 0 ms
Survey data from wlan0
 frequency: 2422 MHz
 noise: -92 dBm
 channel active time: 63 ms
 channel busy time: 4 ms
 channel receive time: 0 ms
 channel transmit time: 0 ms

[truncated]

Survey data from wlan0
 frequency: 5220 MHz
 noise: -94 dBm
 channel active time: 124 ms
 channel busy time: 0 ms
 channel receive time: 0 ms
 channel transmit time: 0 ms
Survey data from wlan0
 frequency: 5240 MHz [in use]
 noise: -94 dBm
 channel active time: 15723 ms
 channel busy time: 513 ms
 channel receive time: 185 ms
 channel transmit time: 3 ms
Survey data from wlan0
 frequency: 5260 MHz
 noise: -94 dBm
 channel active time: 85031 ms
 channel busy time: 84907 ms
 channel receive time: 84907 ms
 channel transmit time: 84907 ms

[truncated]

iw dev wlan0 station dump:
Station 00:11:74:##:##:## (on wlan0)
 inactive time: 5444 ms
 rx bytes: 11797197
 rx packets: 38419
 tx bytes: 1703260
 tx packets: 9779
 tx retries: 14295
 tx failed: 43
 signal: -58 dBm
 signal avg: -60 dBm
 tx bitrate: 24.0 MBit/s
 rx bitrate: 300.0 MBit/s MCS 15 40MHz short GI
 authorized: yes
 authenticated: yes
 preamble: long
 WMM/WME: yes
 MFP: no
 TDLS peer: no
iw dev wlan0 scan dump:
BSS 00:11:74:##:##:##(on wlan0) -- associated
 TSF: 61418055#### usec (7d, 02:36:20)
 freq: 5240
 beacon interval: 100 TUs
 capability: ESS Privacy SpectrumMgmt ShortSlotTime (0x0511)
 signal: -60.00 dBm
 last seen: 847370 ms ago
 Information elements from Probe Response frame:
 Supported rates: 24.0* 36.0 48.0 54.0 
 DS Parameter set: channel 48
 Country: US Environment: Indoor/Outdoor
 Channels [36 - 36] @ 24 dBm
 Channels [40 - 40] @ 24 dBm
 Channels [44 - 44] @ 24 dBm
 Channels [48 - 48] @ 24 dBm
 Channels [52 - 52] @ 23 dBm
 Channels [56 - 56] @ 23 dBm
 Channels [60 - 60] @ 23 dBm
 Channels [64 - 64] @ 23 dBm
 Channels [100 - 100] @ 24 dBm
 Channels [104 - 104] @ 24 dBm
 Channels [108 - 108] @ 24 dBm
 Channels [112 - 112] @ 24 dBm
 Channels [116 - 116] @ 24 dBm
 Channels [120 - 120] @ 24 dBm
 Channels [124 - 124] @ 24 dBm
 Channels [128 - 128] @ 24 dBm
 Channels [132 - 132] @ 24 dBm
 Channels [136 - 136] @ 24 dBm
 Channels [140 - 140] @ 24 dBm
 Channels [144 - 144] @ 24 dBm
 Channels [149 - 149] @ 30 dBm
 Channels [153 - 153] @ 30 dBm
 Channels [157 - 157] @ 30 dBm
 Channels [161 - 161] @ 30 dBm
 Channels [165 - 165] @ 30 dBm
 Power constraint: 3 dB
 BSS Load:
 * station count: 2
 * channel utilisation: 4/255
 * available admission capacity: 31250 [*32us]
 HT capabilities:
 Capabilities: 0x9ef
 RX LDPC
 HT20/HT40
 SM Power Save disabled
 RX HT20 SGI
 RX HT40 SGI
 TX STBC
 RX STBC 1-stream
 Max AMSDU length: 7935 bytes
 No DSSS/CCK HT40
 Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
 Minimum RX AMPDU time spacing: 8 usec (0x06)
 HT TX/RX MCS rate indexes supported: 0-15
 HT operation:
 * primary channel: 48
 * secondary channel offset: below
 * STA channel width: any
 * RIFS: 1
 * HT protection: no
 * non-GF present: 1
 * OBSS non-GF present: 0
 * dual beacon: 0
 * dual CTS protection: 0
 * STBC beacon: 0
 * L-SIG TXOP Prot: 0
 * PCO active: 0
 * PCO phase: 0
 VHT capabilities:
 VHT Capabilities (0x338001b2):
 Max MPDU length: 11454
 Supported Channel Width: neither 160 nor 80+80
 RX LDPC
 short GI (80 MHz)
 TX STBC
 RX antenna pattern consistency
 TX antenna pattern consistency
 VHT RX MCS set:
 1 streams: MCS 0-9
 2 streams: MCS 0-9
 3 streams: not supported
 4 streams: not supported
 5 streams: not supported
 6 streams: not supported
 7 streams: not supported
 8 streams: not supported
 VHT RX highest supported: 0 Mbps
 VHT TX MCS set:
 1 streams: MCS 0-9
 2 streams: MCS 0-9
 3 streams: not supported
 4 streams: not supported
 5 streams: not supported
 6 streams: not supported
 7 streams: not supported
 8 streams: not supported
 VHT TX highest supported: 0 Mbps
 VHT operation:
 * channel width: 1 (80 MHz)
 * center freq segment 1: 42
 * center freq segment 2: 0
 * VHT basic MCS set: 0xfffc
 WMM: * Parameter version 1
 * u-APSD
 * BE: CW 15-1023, AIFSN 3
 * BK: CW 15-1023, AIFSN 7
 * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
 * VO: CW 3-7, AIFSN 2, TXOP 1504 usec
 RSN: * Version: 1
 * Group cipher: CCMP
 * Pairwise ciphers: CCMP
 * Authentication suites: IEEE 802.1X FT/IEEE 802.1X
 * Capabilities: PreAuth 1-PTKSA-RC 1-GTKSA-RC MFP-capable (0x0081)
 * 0 PMKIDs
 * Group mgmt cipher suite: AES-128-CMAC
iw dev wlan0 link:
Connected to 00:11:74:##:##:## (on wlan0)
 freq: 5240
 RX: 11797197 bytes (38419 packets)
 TX: 1703260 bytes (9779 packets)
 signal: -58 dBm
 tx bitrate: 24.0 MBit/s

 bss flags: short-slot-time
 dtim period: 1
 beacon int: 100

That’s a lot more Wi-Fi data than most other platforms make natively accessible.

Additionally, to view most of this data without crosh, use this internal Chrome URL. Just enter it into the address bar and hit enter.

chrome://system/

Areas of interest for Wi-Fi data:

  • network-devices – same output as the “connectivity show devices” crosh command
  • network-services – same output as the “connectivity show services” crosh command
  • wifi_status – same output as the “network_diag –wifi” crosh command
  • lspci – you can see the Wi-Fi chipset hardware here (more on that later)
  • network_event_log
  • netlog

Viewing Logs

You can start logging Wi-Fi events using this crosh command.

crosh> network_logging wifi

Old flimflam tags: []
Current flimflam tags: [device+inet+manager+service+wifi]

method return sender=:1.1 -> dest=:1.146 reply_serial=2
Old wpa level: info
Current wpa level: msgdump

View the resulting device event logs at this internal Chrome URL: chrome://device-log/

Run this command to view the kernel log, which includes a lot of Wi-Fi events. I wish there was a –follow option, but currently there is not.

crosh> dmesg

A restart will return the Chromebook to normal logging levels.

And if you really want to bury yourself in logs, go to chrome://net-internals/#chromeos, click Wi-Fi to enable debugging on that interface, let the “capturing events” count creep up while you perform a task, then click “Store debug logs” to save a debug-logs_<date>.tgz archive in your Downloads folder. Be warned, the signal to noise ratio is very low with this approach. Google provides a log analyzer that you can upload these files to, but I’ve never had the need to go that far down the road. This is best used if you need to submit logs to the Google Apps Enterprise Support Team or a hardware manufacturer.

Advanced Wi-Fi Analysis with Developer Mode

But wait, there’s more! If you can put a Chromebook into Developer Mode, you can run packet captures and break into the Linux bash shell. Most enterprise-managed Chromebooks will have this mode disabled for obvious reasons, but it’s easy enough to move your test Chromebook into a test OU and disable this and other restrictions for testing purposes. (That’s IT testing, not high-stakes student testing! Make sure your OU’s clearly differentiate the two.)

Packet Capture

First, determine which channel’s frequency you’d like to run the capture, and also if channel bonding is in use. The internal URL from above will work for this as well as the “network_diag –wifi” crosh command. The frequency of the currently associated BSS is displayed at the end of that output here.

…
iw dev wlan0 link:
Connected to 00:11:74:##:##:## (on wlan0)
 freq: 5240
 RX: 11797197 bytes (38419 packets)
 TX: 1703260 bytes (9779 packets)
 signal: -58 dBm
 tx bitrate: 24.0 MBit/s

 bss flags: short-slot-time
 dtim period: 1
 beacon int: 100
Screenshot 2016-05-09 at 2.37.00 PM
Disable the Wi-Fi NIC here.

Now turn off the Wi-Fi NIC in the GUI so it can be put into monitor mode.

You can now run the packet capture using the crosh command below.

Optionally, specify a secondary channel above or below the primary if you are doing a 40 MHz 802.11n capture by appending the “–ht-location <above|below>” flag.

 

crosh> packet_capture --frequency <frequency in MHz>

Capturing from phy0_mon.  Press Ctrl-C to stop.
^CCapture stored in /home/chronos/user/Downloads/packet_capture_7K08.pcap

You’ll get a pcap file complete with Radiotap headers if the hardware supports it saved in the Downloads folder which you can send to another machine to do analysis. If the Chromebook is all you have available, you can upload the pcap to CloudShark for analysis.

Wi-Fi Troubleshooting in Bash

Once you’ve got Developer Mode enabled, you can use the bash shell and follow the network log (or any other log) as things happen. This is my preferred way to troubleshoot Chromebook Wi-Fi issues in real time.

crosh> shell
chronos@localhost / $ tail -f /var/log/net.log

Now go do something to the Wi-Fi connection and watch the log scroll by.

A few Linux networking commands you may already know are available here as well like ifconfig, arp, and netstat.

Wi-Fi Chipset and Driver Information

While you’re in the bash shell, you can also determine the Wi-Fi chipset hardware in use. The output of this lspci command will only show the Wi-Fi adapter and the driver it is using. The basic output of lspci is included in chrome://system, but this method allows you to get more data. Add a -v flag or two to see even more.

crosh> shell
chronos@localhost /sys $ sudo lspci -nnk | grep -A2 0280

01:00.0 Network controller [0280]: Qualcomm Atheros AR9462 Wireless Network Adapter [168c:0034] (rev 01)
        Subsystem: Foxconn International, Inc. Device [105b:e058]
        Kernel driver in use: ath9k

This Acer C720 Chromebook has a Qualcomm Atheros AR9462 and uses the ath9k driver.

Run this command to discover the Wi-Fi chipset driver version. This is helpful if you want to know if the Wi-Fi chipset drivers were updated during a system update.

crosh> shell
chronos@localhost / $ sudo ethtool -i wlan0

driver: ath9k
version: 
firmware-version: 
bus-info: 0000:01:00.0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

In this case no version number is reported, perhaps because the OS is using a generic Atheros driver that is packaged with the Linux kernel.

Below is the output of the same commands on an HP Chromebook 11 G4 running Chrome OS 41. This machine has an Intel Wireless-AC 7260 chipset and the driver and firmware-version are listed.

crosh> shell
chronos@localhost / $ sudo lspci -nnk | grep -A2 0280

01:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev c3)
        Subsystem: Intel Corporation Dual Band Wireless-AC 7260 [8086:c070]
        Kernel driver in use: iwlwifi
crosh> shell
chronos@localhost / $ sudo ethtool -i wlan0

driver: iwlwifi
version: 3.10.18
firmware-version: 23.14.10.0
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

The driver version appears to just be the Linux kernel version. The firmware-version is the chipset driver version.

Interestingly, after updating this HP Chromebook to Chrome OS 50, the Wi-Fi chipset firmware-version changed… but went down.

crosh> shell
chronos@localhost / $ sudo ethtool -i wlan0

driver: iwlwifi
version: 3.10.18
firmware-version: 16.229726.0
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

An inspection of the iwlwifi version history shows that this driver is actually newer than the previous version. Before version 16 it was the third number in the version that indicated what major branch it came from, so version 23.14.10.0 was actually from the version 10 branch. Thankfully, that’s cleared up in newer versions of the driver so that the first number is the version branch.

It’s good to see that Google includes Wi-Fi chipset driver updates with Chrome OS updates. This is especially nice as system updates are downloaded and installed automatically to Chromebooks. Personally, I’ve seen system updates resolve odd Chromebook Wi-Fi problems and it’s possible the newer drivers are the solution.

Making RRM Work

There’s been a lot of good discussion within the Wi-Fi community recently about the viability of radio resource management (RRM), or the automatic selection of channels and Tx power settings by proprietary vendor algorithms. At Mobility Field Day 1 there was this excellent roundtable.

Personally, I usually fall into the static design camp, for many of the same reasons as others. I don’t want RRM to change the carefully tuned design I put in place and create an unpredictable RF environment, I’ve seen RRM do some very peculiar things like put adjacent AP’s on the same channels or crank up the Tx power of 2.4 GHz radios in an HD environment, RRM doesn’t disable 2.4 GHz radios when CCC is present, and it doesn’t plan DFS channels properly. Still, I’ve tried to keep an open mind.

Static designs have their limitations too. Statically designed WLAN’s can’t react to new neighboring networks contending for the same airtime, or new sources of RF interference that weren’t there when the static design was developed. It’s a real benefit of RRM that it does automatically correct for these problems.

Let me propose a hybrid approach that uses static design to handle the things that RRM does poorly, while still allowing RRM to react to the changing RF environment.

Static Design Elements

  • Tx power levels should be statically assigned. Once finely tuned as part of the design process, why would they ever need to change?
  • Excess 2.4 GHz radios in high density environments should be manually disabled because RRM simply won’t do this.
  • DFS channels should be statically planned. RRM can clump DFS channels near one and other, resulting in a 5 GHz dead zone for clients without DFS support. Also, because of these clients, DFS channels should only be used when non-DFS channels are all already deployed. Therefore, statically plan DFS channels when needed in areas where non-DFS channels create secondary coverage, and let RRM dynamically plan the other bands. It’s less likely to have a neighbor or transient hotspot appear in the DFS bands anyway.
  • Set channel channel bandwidth statically. The design process includes considering the capacity requirements of the WLAN to determine the appropropriate 5 GHz channel bandwidth. RRM algorithms don’t know what your capacity requirements are. 2.4 GHz should always be 20 MHz.

Things Left to RRM

  • 2.4 GHz channel planning, once excess radios are disabled. Channels 1, 6, and 11 only, of course.
  • 5 GHz channel planning, once DFS channels are statically assigned.
  • That’s all.

The benefit of this approach is that it addresses many of the shortcomings of RRM while still retaining its main benefit: the WLAN can dynamically react to RF interference and transient neighbors by moving affected AP radios to clear spectrum. The things that RRM can’t do or does poorly are simply removed from its control.

Even within these constraints, there are still some vendor’s RRM algorithms I trust more than others. And even those I trust enough to try this with, I’d still want to monitor regularly to make sure the WLAN hasn’t turned into the RRM trainwreck the I’ve seen all too often when RRM is given free reign.