Categories
802.11ax HD WLAN

Sorting Out BSS Color, Spatial Reuse, and Dual NAV

This post first appeared on 7signal.com.

We usually only hear about BSS Coloring in the marketing of Wi-Fi 6, but Spatial Reuse and Dual NAV are related important features of 802.11ax. Let’s sort them out, but first some background.

All 802.11 stations (AP’s and clients) must make sure that the channel they are operating on is free before transmitting. This prevents collisions with other stations operating on the same channel. 802.11 stations accomplish this through two methods: physical carrier sense at layer 1 and virtual carrier sense and layer 2. Physical carrier sense listens for 802.11 preambles that are transmitted at the beginning of every frame. This is the clear channel assessment signal detect (CCA-SD), sometimes called preamble detect. Physical carrier sense also checks for any RF energy on the channel. This is the clear channel assessment energy detect (CCA-ED). Virtual carrier sense operates at layer 2 using a frame’s MAC header Duration/ID field to determine how long an ongoing frame exchange will last. It sets the station’s NAV timer (network allocation vector), which prevents the station from transmitting until it counts down to zero, even if physical carrier sense determines the channel to be idle. Both carrier sense methods must determine that the channel is available before the station can transmit.

Because modern 802.11 radios are very sensitive, CCA-SD causes a station to defer transmitting even if it detects a very low RSSI signal from a distant BSS operating on the same channel. Co-channel interference, referred to as overlapping BSS (OBSS) in the standard, is a problem then even at very low RSSI, as most 802.11 radios will trip their CCA-SD check when they detect a transmission just 4 dB above the noise floor, and defer transmission. If instead, such a station transmitted despite the low RSSI OBSS transmission, it is likely that the receiving station would hear it successfully, which would increase overall spectral efficiency and limit the negative effects of CCI.

802.11ax introduces enhancements to both physical and virtual carrier sense to help address this issue. Spatial Reuse works at the physical carrier sense level and enhances CCA-SD, and Dual NAV works at the virtual carrier sense level and enhances the NAV timer. Both features cause stations to act on the BSS color field, although a BSS color is not required in all cases for them to work. When used in combination, these features can increase the spectral efficiency of 802.11ax.

BSS Coloring

BSS Coloring is simply the ability for an AP to advertise a BSS color, which is actually a number, in its beacon and probe response frames, as well as include the same BSS color field in the HE preamble of the 802.11ax frames that it transmits. Clients that support BSS Coloring also add a BSS color field to the HE preamble of the 802.11ax frames that they transmit. The AP and all its clients in the BSS use the same color value. Overlapping BSS’s on the same channel use a different color to indicate that their frames are OBSS, and therefore they may be treated differently using one or both of the techniques below. The presence of BSS coloring on its own doesn’t change station behavior, it must be acted on using the following techniques to provide any benefit.

Note that some AP vendors and the Wi-Fi Alliance talk very generally about BSS Coloring and I suspect that they really mean BSS Coloring with Spatial Reuse operation.

Spatial Reuse

Spatial Reuse introduces the concept of an OBSS-PD threshold (overlapping BSS packet detect) to CCA-SD. In the OBSS scenario, each BSS will use a unique BSS color. Spatial Reuse allows the stations in each BSS to use a less sensitive preamble detection threshold for OBSS frames during their normal CCA-SD check. That way, even though there may be an OBSS frame making the channel busy, if it is not very loud and there is still significant SNR, an 802.11ax station that supports Spatial Reuse can transmit anyway. To account for the temporarily lower SNR, it may use a lower, more robust MCS. One limitation of Spatial Reuse is that the OBSS transmitting station can’t make the same adjustment to its MCS because it has no knowledge of the other station’s future intention to transmit. 802.11be may solve this problem with new multi-AP coordination features.

Spatial Reuse support is indicated in beacon and probe responses by the Spatial Reuse Parameter Set IE. This is also where the specific thresholds are defined along with which spatial reuse method is to be used. The two methods are OBSS-PD-based operation and parameterized spatial reuse-based operation (PSR), the details of which are beyond the scope of this blog.

Dual NAV

Dual NAV (referred to as “two NAV operation” in the standard) works at layer 2 using the duration field of a frame’s MAC header, and it also takes advantage of the new TXOP field present in the HE preamble. It requires 802.11ax clients to establish two NAV timers, an intra-BSS NAV for all frames within the BSS, and a basic NAV for OBSS frames (often called inter-BSS frames).

The intra-BSS NAV timer is set by frames that match the station’s BSS color or frames with a BSSID field that matches the station’s associated AP. The basic NAV timer is set by OBSS frames with a different BSS color, frames with no BSS color in the case of legacy frames, or frames with a BSSID field that doesn’t match the station’s associated AP.

This helps 802.11ax stations overcome several problems. A legacy station with a single NAV can have its NAV incorrectly shortened by an OBSS frame declaring a shorter duration than its current NAV value. This scenario is particularly troublesome during the long TXOP’s an AP holds for OFDMA frame exchanges. Dual NAV prevents OBSS frames from resetting the intra-BSS NAV.

On the other hand, the basic NAV can also protect OBSS frames during OFDMA if an AP has set the carrier sense required field with its preceding trigger frame. If a client in that scenario has a non-zero basic NAV, it will not respond to the trigger frame in order to avoid a collision with the OBSS transmission. Therefore, the state of the CS required field in the trigger frame determines if a client will respect the basic NAV or transmit anyway, but this only applies to OFDMA operation.

In all other scenarios, both NAV’s must equal zero in order for an 802.11ax station to transmit.

A key difference in 802.11ax is that there is a new TXOP field present in the HE preamble which sets the NAV timer. This allows the NAV to be set at the PHY level, removing the need for RTS/CTS TXOP protection when legacy PHY’s are not present. It blurs the layer 1/layer 2 distinction between the preamble and NAV. It also allows the NAV to be set at lower SNR and at greater distance than previous generations of 802.11, which only set the NAV via a frame’s duration field or RTS/CTS protection. Dual NAV became necessary to prevent the OBSS NAV reset issue from becoming much worse in 802.11ax with the NAV now set by the robustly modulated HE preamble.

Dual NAV can operate using the BSSID field present in non-HE frames to distinguish OBSS frames, like in a mixed environment with 802.11ac and 802.11n stations present. It also operates when BSS coloring is disabled on an AP.

Putting it All Together

Spatial Reuse can make a station less sensitive to OBSS transmissions and increase the likelihood of successful simultaneous transmissions, increasing the spectral efficiency of 802.11ax. Dual NAV on its own will probably only have a marginal impact on spectral efficiency. Its value lies in ensuring virtual carrier sense is accurate and reducing collisions. However, when these features are used in combination, Spatial Reuse will prevent OBSS frames below the OBSS-PD threshold from setting the basic NAV, increasing spectral efficiency by desensitizing both physical and virtual carrier sense to OBSS frames.

Now it is helpful to understand how these features are implemented. 802.11ax has different requirements for AP’s and clients as to what mix of them is mandatory.

Station TypeBSS ColoringSpatial ReuseDual NAV
APMandatoryOptionalOptional
ClientMandatoryOptionalMandatory

Most 802.11ax AP’s come with BSS Coloring enabled by default, although the standard allows it to be disabled. Unfortunately, Spatial Reuse is optional for all stations, however Cisco has announced AP support for OBSS-PD-based Spatial Reuse in recent code versions. It seems unlikely that client vendors will implement it if it is not required. Dual NAV is optional for AP’s and mandatory for clients. The standard doesn’t explain this, but perhaps this is because the AP owns the TXOP during both upload and download OFDMA, so it will not reset its NAV due to OBSS frames during that period. However, it may also be due to the mobile nature of clients who can be anywhere within an AP’s coverage and are more likely to encounter and create OBSS conditions than their associated AP.

In practice, 802.11ax stations that only support Dual NAV without Spatial Reuse won’t see a significant improvement to spectral efficiency under OBSS conditions, perhaps only benefiting during OFDMA operation. Combining BSS Coloring with Dual NAV and Spatial Reuse is the key to significantly improving spectral efficiency through reducing physical and virtual carrier sense sensitivity to OBSS transmissions.

Categories
802.11ax WLAN

What’s Different About 802.11ax in 6 GHz

There has been some consternation that 802.11ax should have a greenfield mode in 6 GHz, leaving behind all the protocol overhead used for backwards compatibility in the 2.4 and 5 GHz bands. This mythical mode could also have fantastic new capabilities that would now be possible without legacy PHY requirements. 6 GHz is an opportunity to so radically overhaul 802.11 that we could increment the 802.11 version bit in all 802.11 6 GHz frames (It’s been 0 for the entire history of Wi-Fi). Of course, it probably isn’t reasonable to expect the same amendment that must provide backwards compatibility in the legacy bands to also do something radically new in 6 GHz. It may also be unrealistic to expect 802.11ax client chipsets that operate in the legacy bands in legacy modes to do something radically different using the same radio in 6 GHz. Still, there are real protocol differences in 6 GHz 802.11ax operation. 802.11ax is not ratified, so it is still possible for some things to change, but I thought I’d run down what’s different and what opportunities I think were missed with 802.11ax in 6 GHz.

  • Security Upgrade – 802.11ax will make SAE and OWE mandatory replacements for PSK and open auth respectively in 6 GHz. MFP will be required. I still don’t understand how this will work with SSID’s that span the legacy bands to support legacy WPA2 clients as well as WPA3-only clients in 6 GHz. If the answer is “Just add another SSID for Wi-Fi 6E clients-only,” then I will be disappointed.
  • OFDM and HE-Only – There is no HT or VHT operation allowed in 6 GHz. Unique HE beacon IE’s indicate support for features inherited from HT and VHT. There are no HT/VHT Capabilities IE’s in a 6 GHz beacon. No HT or VHT MCS will be used in 6 GHz. OFDM is there because its shorter preamble consumes less airtime than the new HE preamble, so it will be used for those frames that don’t require the bigger HE preamble.
  • Basic HE-MCS and NSS (number of spatial streams) Set – We aren’t using legacy rates outside of the legacy preamble, RTS/CTS, and legacy ACK frames. Most frames can be modulated with HE-MCS encoding, including beacon, multi-STA blockACK, and trigger frames, some of which can be transmitted with multiple spatial streams. AP vendors may choose to continue using OFDM for these frames, however.
  • Spatial Reuse Can Work – 6 GHz STA’s can take full advantage of BSS Coloring and OBSS CCA-PD. Without legacy STA’s to conflict with, we can design for these features to significantly desensitize all intra-BSS STA’s from OBSS frames, and allow for increased spectral efficiency by reusing the channel more aggressively. If implemented, those robustly-modulated preambles are a smaller problem. However, Spatial Reuse (the OBSS CCA-PD) is optional in 802.11ax. Dual NAV is required for clients, and optional for AP’s. Confused yet?
  • EDCA Optimization – This might be the trick to getting OFDMA operation to take place more often. All 6 GHz STA’s will support OFDMA, so why not increase the contention window for the SU EDCA access categories advertised in beacons (there is a separate MU EDCA table for OFDMA)? That would reduce the likelihood of clients winning access to the channel for SU operation, and increase the likelihood that the AP will win the channel for OFDMA operation to take place.
  • Less RTS/CTS Overhead? – Because all 6 GHz STA’s can interpret the HE preamble, which includes the duration of the TxOP, normal RTS/CTS protection is redundant. The 802.11ax draft allows for several ways to establish a TxOP, including the legacy duplicate RTS/CTS method, so we will have to wait and see what the vendors choose to use in 6 GHz. In ideal circumstances, 802.11ax in 6 GHz will look like this: AP wins arbitration, trigger frame, MU-PPDU, BlockACK, repeat, repeat, repeat…
  • Reason Code 71 – 6 GHz AP’s can deny an association request from a client with poor RSSI using status “DENIED_POOR_CHANNEL_CONDITIONS” or disassociate a low RSSI client with a new reason code 71,”POOR_RSSI_CONDITIONS.” A 6 GHz client must respond to this sensibly (e.g. not blacklisting the BSSID/SSID as clients sometimes do in the legacy bands). Although vendors have had features that accomplished this for a long time, client behavior in response has always been a mixed bag.
  • 6 GHz AP Discovery and Association – A 6 GHz STA can discover, and in some cases associate to, a 6 GHz radio while operating in the 2.4 or 5 GHz bands. An AP’s beacon, probe response, and neighbor report frames in those bands can indicate the channel and channel width of their matching 6 GHz radio. Additionally, for 6 GHz-only operation, a specific subset of channels will be identified as preferred scanning channels (PSC) where the primary channel of a wide channel BSS should reside, limiting the channels a client needs to scan to discover a 6 GHz-only AP. PSC’s are spaced 80 MHz apart, so a client would only need to scan 14 channels in the US. Active probing in the 6 GHz band in the US is only allowed after a client has heard an AP transmission on the channel, which includes a beacon frame, an unsolicited probe response sent to the broadcast address, or a FILS discovery frame. However, one of these frames can be transmitted at least every 20 TU’s, which allows for less required dwell time on the channel for passive scanning. Less required dwell time and a limited set of PSC’s will make passive AP discovery faster in 6 GHz than 5 GHz.
  • 80 MHz AP Channel Width Minimum in 6 GHz? (see update) – They weren’t lying when they said “80 is the new 20.” Maybe things will change before 802.11ax is ratified, but I’ve learned that a 6 GHz AP will have to indicate support for at least 80 MHz channel width. This aligns well with the PSC’s the standard will define. I don’t know why the IEEE would require this, as it is extremely undesirable in the LPV WLAN’s where 802.11ax in 6 GHz would otherwise provide the most benefit. It’s an even larger problem in countries with unlicensed access to a smaller portion of the band. Clients may still use smaller channel widths, including a 20 MHz-only operating mode. 5/2021 update: Consumer and enterprise AP’s that have been released to the market support 20 and 40 MHz channel width operating modes in 6 GHz. Thankfully, that one problematic sentence in 802.11ax has not been interpreted to refer to a minimum channel width for operation.
  • How Much of the Wide Channel Can Be Used? – The use of wide channels followed wasteful logic in 802.11ac with dynamic bandwidth operation (DBO): If the primary 20 MHz channel of a 160 MHz BSS is busy nothing can be transmitted. If the secondary 20 MHz channel that would be used for a 40 MHz channel-width is busy then the STA can only use 20 MHz, the rest of the 120 MHz is unused. If those first two 20 MHz channels are clear, another 40 MHz is checked to see if 80 MHz of the channel is available, etc. And this pattern of checking for the next wider channel width is done serially through the CCA process which also introduces more overhead on its own. Remember, OFDMA only happens within the TxOP gained from a wide-channel arbitration process, which can be subject to the logic I just described. DBO was optional in 802.11ac and not widely implemented, and it appears to be optional in 802.11ax as well. This is important because…
  • Preamble Puncturing is Optional – What improves spectral efficiency in the scenarios above is a new 802.11ax feature called preamble puncturing, which allows a HE STA to transmit across the full 160 MHz of spectrum, but not within the specific 20 MHz subchannels that are busy. One busy subchannel doesn’t prevent the use of others, but the primary channel must still always be free for anything to be transmitted. However, preamble puncturing is an optional feature in 802.11ax. So even in the best case scenarios, OFDMA can only subdivide the channel within the 20 MHz subchannels determined to be available via CCA and (maybe) RTS/CTS. In the worst case scenario (no DBO or preamble puncturing), no data frame, OFDMA or otherwise, can be transmitted unless the entire wide channel is available (minimum 80 MHz in 6 GHz!).

The primary channel bottleneck is particularly troublesome because future generations of 802.11 in 6 GHz will have to account for 802.11ax operation to maintain backwards compatibility, although a lot of traffic that used to be primary channel-only can be included in OFDMA transmissions now (ACK, null-data frames). I wish we could have left behind the legacy channel arbitration process as well, or at least made preamble puncturing mandatory. Other problems may be improved in future amendments. The potential for reduced overhead and higher likelihood of the AP winning channel access are significant improvements when coupled with all 802.11ax clients in 6 GHz.

To Be Determined

  • The Wi-Fi Alliance – The WFA could decide that features that are optional in 802.11ax are mandatory for Wi-Fi 6E certification. Preamble puncturing and spatial reuse are very beneficial features that should be mandatory.
  • 802.11md – This is the maintenance work being done to roll-up 802.11 into 802.11-2020. It will include 802.11ax-2020 and leaves open the possibility of additional changes to 6 GHz operation occurring that are not part of the 802.11ax drafts, but happen within the roll-up process separately. An example of this happening in the past is 802.11 Fine Timing Measurement, which was added to the standard through the 802.11mc roll-up to 802.11-2016. That feature does not have its own 802.11xx amendment. Hopefully this will all be sorted out in December. Stay tuned.
  • 802.11ax, TBH – It’s not finalized yet, so perhaps some of this will change. I’ll update this blog if that happens.
  • AP vendors – As is often the case, the standard provides for many ways to achieve things and leaves many features as optional. It will be up to the AP vendors to determine what actually gets implemented in the real-world.
Categories
Roaming Security WLAN

Wi-Fi: What We Need and What We Keep Getting

wifi_signal-1

No technology is perfect, but for most of my career in Wi-Fi there has been a persistent set of problems that continue to have no resolution in sight. They could be fixed, other wireless protocols have solutions for some of them, and there have been attempts to fix them but the results are so watered-down that they are ineffective. Today I’m going to channel my inner Lee Badman and get a little grumpy about Wi-Fi. Please bear with me as I go through my list of gripes. These are the real problems that real enterprises have with Wi-Fi, and each successive generation of Wi-Fi has does little to address them.

Crap Clients

Much has been written about the sorry state of Wi-Fi clients, so I won’t go too far into what is already well-documented. But so many Wi-Fi clients are utter garbage! They lack support for enterprise security (WPA2/3-Enterprise), some only support the enterprise-unfriendly 2.4 GHz band, there are new clients on the market today with 802.11g radios in them, their drivers are buggy and often go unpatched, and few clients support amendments to the 802.11 standard that are important to enterprise Wi-Fi performance and security (802.11k/v/r/w). I could go on… but why beat a dead horse?

Bad Roaming

This is mostly a client problem in Wi-Fi, but it deserves a callout all its own. Very, very few Wi-Fi clients roam effectively. Some are so sticky that they are totally unusable in a multi-AP network unless they never move. Further, most clients provide zero visibility into their roaming algorithm, let alone provide any configuration to correct it. Yes, some manufacturers have published roaming specs, but they are not telling the whole story, and real-world observations often contradict their documentation.

There have been engineering efforts at IEEE to improve roaming, but very little has come of it, and the Wi-Fi Alliance does not test that clients roam effectively in its certification programs. It’s the wild west, anything goes, and you don’t know what you are getting until you take a client out of the box and test it yourself.

And yet, the tools to fix the situation already exist. I believe that the right combination of 802.11k and 802.11v features could fix the sticky client problem. With 802.11k beacon reports, all clients could periodically report their RSSI and the RSSI of nearby AP’s to the AP. The AP could then use 802.11v BSS transition frames to direct clients to roam to the appropriate AP at the appropriate RSSI or MCS threshold. The WLAN administrator could configure whatever RSSI or MCS threshold was appropriate for the WLAN as designed, and all clients would roam in accordance with it. This is similar to the method LTE uses for handoffs (roaming in cellular-speak).

Unfortunately, client support for 802.11k is limited and support for beacon reports is even more limited. Same for 802.11v. AP vendors let you enable or disable these features, but give little insight into how they will actually behave (e.g. What a client actually does with 802.11k neighbor reports is anyone’s guess because they are absorbed into their already flawed, proprietary roaming algorithms, and how and when AP’s use BSS transition frames is largely undocumented). Because the IEEE decided these features are optional, and the Wi-Fi Alliance does not require their support for certification, we will never be able to fix roaming this way. This major problem will remain unresolved for as far as I can see into the future.

Unstable WLAN Infrastructure Products

If you have worked with Wi-Fi long enough, you have a favorite facepalm-inducing example of an access point bug that should never have been allowed out into the wild. And yet they are, frequently, as if no quality assurance or beta testing is ever done on the code that so many mission critical WLAN’s rely on. No AP vendor is immune. It’s shocking. It’s scandalous. Managers often don’t believe what their wireless engineers tell them about the shoddy state of the code they are running on networks that support patient care in hospitals and critical factory production lines, but it is a very real problem.

I used to think, “Well, once we get to the next major release they will have all this fixed.” That was many years ago.

Cumbersome Enterprise Security

Provisioning client suppliants for enterprise Wi-Fi security is much more difficult and complex than it ought to be, and for many clients it is impossible. Supplicant support is lacking or broken, and bulk provisioning is even harder to execute.

No Guest Wi-Fi Security

Why, in 2020, should guest Wi-Fi be unencrypted, and lack identity verification of the network? Is there a more common protocol than 802.11 that still isn’t completely wrapped in TLS?

Opportunistic Wireless Encryption (OWE) solves part of the problem by implementing encryption for open networks, but it doesn’t provide network identity verification, and it became optional when the Wi-Fi Alliance controversially stripped it out of WPA3, so like so many other promising innovations in Wi-Fi, I doubt that it will ever be universally supported.

Captive Portal Hell

There are few technologies that are as user-punishing as Wi-Fi captive portals. They require ugly hacks to sort-of-work, and the constant increase in HTTP and DNS security makes them more and more of a problem. There has to be a better way, but as best as I can tell, no one is working on one.

Hotspot 2.0 has a feature called Online Sign-Up (OSU), which does address it, but only for Passpoint networks, and the big RADIUS server vendors have yet to build support for it. There is no telling if they will.

What We Keep Getting

Alright, so I’ve aired my grievances. What makes them so tiring is that so little progress has been made to resolve them. Roaming has always been a problem in Wi-Fi, junk clients continue to be manufactured and certified, infrastructure code is still a mine field, and 802.11 security still does not meet enterprise requirements.

If we look at each successive generation of Wi-Fi, you’ll see that they always delivered higher data rates, which is a welcome improvement, but in reality that has produced diminishing returns since 802.11n. 802.11ax has really pushed this to the extreme, with efficiency gains that are welcome in large public venues, but are not needed with any real urgency elsewhere. There is no end in sight to this trend. The next generation of Wi-Fi in development at the IEEE is called Extremely High Throughput. It will bring 320 MHz channel widths and 4096 QAM. These features will solve exactly zero problems in Wi-Fi. If a Wi-Fi network isn’t fast enough, this is almost always a design problem, not a protocol limitation. What use are ever higher data rates for clients that roam poorly and struggle to get connected securely in the first place? It is time that increased throughput took a backseat to improved real world client performance, stability, and security improvements.

We have a new security scheme in WPA3, and while hardening Wi-Fi against quantum computing attacks is good, I suppose, it is way down the list of priorities for most WLAN operators. Simpler, bulk provisioning is a much more tangible improvement, and would lead to improved security too. How often do we have to just give up and resort to WPA2-PSK due to client limitations, bad supplicants, and no streamlined provisioning process? It is very rare to find an enterprise WLAN that isn’t using WPA2-PSK, which is branded WPA2-Personal by the Wi-Fi Alliance because it is appropriate for use in home, consumer WLAN’s. That alone should tell you something is very wrong.

WPA3 had a new and promising device provisioning protocol (DPP) that would be nice, but its since been stripped out and dumped into an optional certification called Wi-Fi Easy Connect. I think we all know what that means for its future…

So crap clients, bad roaming, unstable WLAN infrastructure products, cumbersome enterprise security, half-baked guest Wi-Fi security, and captive portal hell are here to stay. The IEEE and Wi-Fi Alliance are not prioritizing these longstanding, real world problems.

Is it any wonder that no one complains that Wi-Fi doesn’t support the CBRS band? Instead we look with excited anticipation at the promise of private LTE and 5G in the enterprise. The powers that be should take note of that lack of disappointment. We are close to the point where Wi-Fi is no longer looked to for mission critical applications that demand stability and reliability. Allowing these long standing issues to persist will cause Wi-Fi to be relegated to a best-effort, bulk traffic transport, not the wireless protocol of choice for important applications.

Organizations are signalling that they are ready to trade the high throughput of Wi-Fi (that they often don’t need) for the reliability of LTE in CBRS for those applications that are most critical. Meanwhile, IEEE continues the march towards 802.11be Extremely High Throughput with its 320 MHz channel width that will make a mess of the 6 GHz band, and 4096 QAM modulation. Features that do not solve real-world problems.

Categories
Hotspot 2.0 Security WLAN

The State of Guest Wi-Fi Security

encryption_lock

Most guest Wi-Fi networks today are open SSID’s with no encryption that have a captive portal that requires users to click through some terms and conditions. It would be nice to be able to secure these networks the same way we do with internal SSID’s–mutual authentication of the client and network, and strong layer 2 encryption, but that challenge has proven too difficult to accomplish without a high degree of friction. You could make users suffer through a lengthy and confusing onboarding process, but imagine doing that at every location where there is guest Wi-Fi? Not good. I agree with Keith Parsons’ take: Guest Wi-Fi should be fast, free, and easy. Security should be too.

How can we make this better? The Wi-Fi Alliance is certifying devices for a new security protocol called Opportunistic Wireless Encryption (OWE). Their certification is called Wi-Fi Enhanced Open, but I’ll refer to it as OWE for the purposes of this blog. OWE adds encryption to open WLAN’s with no client authentication, but it does not provide for server authentication, which leaves users vulnerable to man-in-the-middle (MitM) attacks. The authors of the RFC understood this, and wrote that “the presentation of the available SSID to users should not include special security symbols such as a ‘lock icon.'” Aruba Networks has already announced support for OWE, and I hope other vendors follow suit.

Unfortunately The Wi-Fi Alliance did not choose to make OWE support mandatory in WPA3. It’s a separate and optional certification. Perhaps they will right this wrong by requiring OWE support in Wi-Fi 6 certification, which could require WPA3 support just as 802.11n required WPA2 support. Why not tack on OWE to Wi-Fi 6 as well?

Secure Guest Wi-Fi with Hot Spot 2.0/Passpoint

I once believed that Hot Spot 2.0/Passpoint (HS2.0) was the future of secure guest Wi-Fi, because it allowed for anonymous authentication to a WPA2-Enterprise network. The problem is that users are still required to go through a high-friction onboarding process on every anonymous HS2.0 WLAN they wish to use. That means dealing with captive portals, terms and conditions, installing configuration profiles, etc.

HS2.0 does allow for automatic authentication with user creds from other identity providers. That would allow a user to login with pre-installed creds from their cellular carrier, Facebook, Amazon, Google, Apple, etc.

Telcos are the best choice here as their creds are already installed on mobile phones to authenticate with their cellular networks. However, telcos are unlikely to open their authentication service to WLAN operators for several reasons.

  • They want to be paid for providing this service, but SMB and many large enterprises don’t want to pay to increase the security of their guest networks.
  • It gives an implied endorsement of the security, quality, and reliability of the WLAN, which the telco knows nothing about.

That’s why you see telcos integrating with Boingo, for example, but not smaller players.

But what if there was a HS2.0 open roaming consortium that federated authentication from any identity provider that wanted to join? Something like eduroam for anyone.

The biggest problem is that WLAN authentication in such a scenario tells you nothing about the identity or security of… the WLAN. Users authenticate with their identity provider’s RADIUS servers, and the result is strong encryption in the air, but no guarantee of security on the wired network. They don’t get any information about the identity of the wired LAN that their bits are traversing, because the authentication is abstracted away from the network they are using. HS2.0 provides no identity verification of the network that users are actually using.

This is a smaller problem in eduroam, where most WLAN’s are run by higher education institutions and they agree to operate their networks a certain way. There is some homogeneity there, and users can expect similar security and terms of use between member networks.

An open roaming consortium would allow users to authenticate to a university’s WLAN and a dingy laundromat’s WLAN as if there was no difference. In fact, roaming between those networks would happen automatically without any user interaction. That’s an acceptable risk when all the networks in the consortium are similar (eduroam), but it isn’t when nothing can be assumed about the quality and security of member networks in an open roaming consortium.

Is it reasonable to assume an end-user wants to connect to any WLAN that supports their HS2.0 creds? My answer to that is a definite “no.” One benefit of the non-HS2.0 model is that a user must express an intent to connect to a new WLAN, which gives them the ability to decide if it is trustworthy or not. HS2.0 circumvents this process, and if it becomes more open and widespread, users may end up connecting to networks they don’t trust.

Secure Guest Wi-Fi with an On-Premises Solution

There are several on-premises BYOD or SGW onboarding solutions. They don’t solve the high-friction onboarding problem mentioned previously–they compound it, because the credentials they issue cannot be used between networks. Users must wrestle with a high-friction onboarding process with every SGW network they want to use.

The fundamental problem with Hot Spot 2.0 and On-Premises solutions is that they require client credentials. Authenticating users is not a requirement for SGW in my opinion, and I imagine that’s a common view. It creates unnecessary complexity for users and administrative overhead to WLAN operators. We need a solution for anonymous SGW.

An HTTPS-like Solution

For secure guest Wi-Fi, a security model similar to HTTPS would be great. Client identity is not important, but the WLAN identity should be verified, not just the RADIUS server. Strong encryption must be used, wireless network access must be resistant to MitM attacks, and users should only connect to a SGW network when they have expressed the intent to do so.

Additionally, all of the necessary configuration and complexity to accomplish this should be handled by the WLAN operator. For the end-user, it should “just work.”

Take the example of HTTPS: A web admin requests and is issued a DNS-validated TLS certificate signed by a public certificate authority. She then installs the cert on her web server, configures it for strong encryption, and adds an HTTP to HTTPS 301 redirect. Now visitors to the website are able to verify the website’s identity and connect to it with strong encryption, and they had to do nothing to get those security benefits except run a modern web browser. SGW should be just as easy for end users.

OWE gets us halfway there, but crucially, does not address the threat of MitM attacks. We need a WLAN-centric public key infrastructure (PKI) for that, and that’s the rub. Suddenly there’s a lot of administrative overhead to make this work. Perhaps it would look something like this:

An “Open RADIUS Certificate Authority,” or ORCA, would only issue certs to validated network operators, and those certs could only be used with specific SSID’s.

ORCA’s root cert would have to be be preinstalled and trusted by client devices for EAP authentication.

Wi-Fi clients would connect to an ORCA-enrolled SGW SSID and authenticate anonymously, then validate the ORCA-signed cert presented by the RADIUS server. The client verified that the cert has not been revoked and that it is connecting to an SSID that the cert has been permitted for use. The session is encrypted and the WLAN’s identity is verified. Clients only connect to ORCA-enrolled WLAN’s when they intend to, by clicking/tapping on the SSID in their Wi-Fi menu/settings.

All the end user has to do is tap/click on the SGW SSID to connect to it. Everything else is handled by the client device, the WLAN, and ORCA.

Ta da, we now have low-friction SGW, but for all this work, what have we really gained, today, in 2018?

If you run a packet capture on an open guest network today, you’ll see DNS queries and a whole lot of TLS sessions, not much else. Yes, SGW would add another layer of security on top of this, but at what cost? Making ORCA work is no small task, if it is even achievable in the first place.

Conclusions

OWE gives us layer 2 encryption, so that passive sniffing doesn’t reveal those DNS queries anymore. While OWE doesn’t address MitM rogue AP attacks, coupling it with 802.11w protected management frames, which is required for Wi-Fi Enhanced Open certification, adds resistance to malicious deauth attacks.

The work necessary to make my SGW scheme function doesn’t balance with the small gain in security. It’s better to take a perimeterless networking approach (e.g. BeyondCorp), only deploy hardened applications, and assume the networks your users use will not be trustworthy. If you do not use applications that expose their data to network-level interception or abuse, then have at it. How can an end-user ever truly know if a network is trustworthy anyway?

We can add a bit more security through OWE to help obscure the small amount of guest network traffic that remains unencrypted, and 802.11w protected management frames to prevent some rogue AP attacks. That’s going to have to be good enough.

Categories
Analysis Roaming Troubleshooting WLAN

Roaming Analysis using only a Mac and Wireshark

There are many ways to examine the roaming performance of a Wi-Fi client. Perhaps the gold standard is to follow the client with a laptop running Omnipeek and several Wi-Fi adapters all capturing frames on different channels. I’m also impressed with 7signal’s recent update to Mobile Eye which now logs roaming data as well. But what if you don’t have that, or want to do something quickly with a Mac without switching to Windows and hooking up your Wi-Fi adapter array?

Using a Mac laptop to capture frames on a single channel with Airtool, you can still get valuable information about the roaming performance of a Wi-Fi client with a few Wireshark display filters and some I/O Graphs magic.

The process is simple. Discover the channel the test client is using, and start an over-the-air capture on that channel. Take you Mac and the test client and move out of the current AP’s cell so the client roams away, then come back so that the client roams back. Repeat as necessary until you have captured both a roam-away and a roam-back.

roam_capture
Let’s roam

Now it’s time to look at the captured frames. First, let’s build a display filter to only show the frames to/from the test client, as well as all of the AP’s beacon frames. We’re including the AP’s beacon frames so that we can see the changes in RSSI as the client moved away from then back towards the AP.

wlan.addr == aa:bb:cc:dd:ee:ff || ( wlan.ta contains 11:22:33:44:55 && wlan.fc.type_subtype == 8)

aa:bb:cc:dd:ee:ff is the MAC of the test client. 11:22:33:44:55 is the first five octects of the AP’s BSSID. By matching on the first five octets of the AP’s BSSID rather than the exact BSSID, we preserve the beacon frames from all of the AP’s BSSID’s, which will gives us more data to calculate the RSSI of the AP.

Once applied, export the displayed packets only to a new file that we’ll generate the graphs from. Open the new file and now we can configure the I/O Graphs. These are some of the display filters I use:

7925_roam_graph.png
The roam-away is on the left, and the roam-back is on the right.

AVG Tx Data Rate needs to be set with the test client MAC address, and AP RSSI needs to have the first five octets of the AP’s BSSID.

By zooming into the beginning of the graph, we can observe the client’s data frames, retries, Tx data rate, and the RSSI at which it roamed away. A benefit of dBm being measured in values less than zero is that it is separated from the rest of the data on the graph, so we have layer 1 data below 0, and layer 2 data above.

7925_roam_follow_roam

This Cisco 7925G phone roams away before the AP’s RSSI drops to 70 dBm, and before retries start to increase. We see similar good behavior when it roams back below.

7925_roam_follow_back

Let’s take a look at a Wi-Fi client that roams poorly. Here’s a client-that-shall-remain-nameless roaming away from an AP. You can see retries spiking and its data rate plummeting well before it roams away. The AP’s RSSI drops into the -80’s for most of a minute before it decides to roam!

bad_roam-filtered
This graph includes the test client’s average Tx data rate.

Of course, this approach has some limitations. You must know that a client like the one above was in range of a louder AP operating on a channel it supports when it started having trouble before you decide it’s a sticky client, otherwise it’s doing exactly what it should be doing–trying to maintain the only association it can.

You know when the client decided to roam, but you don’t know how long it took.

As you move away from the AP, you might see the AP’s RSSI spike to 0. That happens when your laptop’s adapter is unable to demodulate beacon frames from the AP due to poor SNR.

Also, the AP RSSI is measured by a Mac laptop that is following the test client. Unless the test client is the same model of Mac laptop, it will probably hear the AP differently, most likely with less sensitivity. My MacBook Pro is a 3×3:3 client, and the two test clients I looked at for this blog are both 1×1, so it’s reasonable to assume the Mac benefits from a significant increase in RSSI from MRC. Taking that into consideration, the poor roaming from the client-that-shall-remain-nameless is probably even worse than it looks.

Categories
Analysis Troubleshooting WLAN

Beware of mDNS Floods from Buggy Android Clients

android_sickRecently, I discovered a large increase in multicast traffic on an enterprise Cisco WLAN. This increase was large enough to cause packet loss in several areas where bandwidth is limited, usually at the WAN edge. While throughput remained within the acceptable range for a circuit, an extremely high packet rate was overwhelming the edge device’s capacity to process packets. So despite bandwidth utilization being normal, packet loss was still occuring.

The primary symptom of this problem was poor voice call quality on calls across the WAN, at seemingly random times, including periods where bandwidth utilization was very low.

A close inspection of one particular site uncovered abnormally high packet rates, measured in pps (packets per second). We used a span port to run a packet capture of the traffic, with the hope of isolating a source of the packet flood. Here’s what we found:

mdns-flood-protocols
Wireshark’s Protocol Hierarchy Window

37% of the packets on this busy WAN circuit were mDNS queries! More specifically, the sources were smartphones on the guest WLAN that were sending mDNS queries for Chromecasts, Apple TV’s, and other services advertised with mDNS. In a centrally-switched WLAN, all of that traffic is backhauled across the WAN to the WLC which services it. These queries were leaving remote sites in a CAPWAP tunnel, traversing the WAN to the WLC, which then forwarded them on to all of it’s AP’s at all other sites, once again traversing the WAN.

Once we had this data, it became any easy problem to fix. There was no business requirement for multicast on the guest WLAN, so we blocked it at the WLC using a layer 3 ACL. This ACL blocks all multicast, but we added a separate line for mDNS to get an idea of the volume of mDNS packets compared to other multicast traffic on the WLAN.

mdns-flood-acl
ACL to block all multicast on the WLAN

You can see from the hit counters that this guest WLAN is seeing a lot of mDNS traffic, which dwarfs any other multicast traffic. With this ACL in place, the problem was resolved. mDNS uses multicast address 224.0.0.251 and UDP port 5353, if you want to block it more precisely.

Looking back at our pre-ACL pcap, I observed several clients on the guest WLAN flooding the network with mDNS queries, which the WLC then forwarded back across the WAN to its AP’s. The top five worst offenders in the pcap were all Android phones, with OUI’s from Samsung, HTC, and LG, and they were each responsible dumping thousands of mDNS packets on the network in a matter of seconds.

mdns-flood-client-pcap
An unfiltered pcap from the WAN edge should not look like this!

In a network with several thousand guest devices, most of which are smartphones, if enough of these buggy Android phones are dumping mDNS queries like this, it could result in serious problems if it is not controlled. The packet rate at the WAN edge may cause issues with other traffic if it results in output drops (such as voice which we observed), and the increase in multicast traffic on the RF may be an issue as well if the WLAN handles multicast traffic normally, which means transmitting it at the lowest mandatory data rate. That will increase airtime utilization substantially, leaving less airtime for other clients to transmit and receive data.

Until recently, I did not understand what triggered this problem to occur. Now I do. There appears to be a bug in recent versions of Android that results in these mDNS floods when the phones leave sleep mode, resulting in thousands of frames being transmitted in the seconds after the client wakes up. TP-Link discovered the same behavior we observed and traced it to “recent releases of Android OS and Google Apps.”

For WLAN administrators, I suggest taking a few minutes to think through the implications of this issue for your network. As I discovered in my troubleshooting, it is very likely to affect more than just the guest WLAN.

PS: For those of you reading this that aren’t network engineers and think your wireless router is junk, it isn’t. And please stop jumping to the conclusion that the infrastructure is to blame for Wi-Fi problems! 80% of Wi-Fi problems are client device issues, not infrastructure issues.

Update: Google has acknowledged this issue and is releasing a patch via Google Play on January 18th.

Categories
RF Uncategorized WLAN

macOS Wi-Fi Roaming

One of the nice things about Intel wireless chipsets is that the drivers expose a lot of controls to help tune the chipset’s operation. One of my favorite of these controls is “Prefered Band,” which I usually adjust to instruct the chipset to prefer the 5 GHz band over the 2.4 GHz band. There are some other useful controls like “Roaming Aggressiveness” and you can also enable Fat Channel Intolerance if a neighbor is rudely using 40 MHz of spectrum in 2.4 GHz.

intel_driver

Although macOS has many advantages over Windows when it comes to Wi-Fi, such as the ability to natively do packet captures with the internal chipset, macOS doesn’t have the same level of customization as a Windows machine with an Intel chipset. And my experience has been that Mac clients don’t roam particularly well. Too often they are “sticky clients” and you need to disable/enable Wi-Fi on them to get them to associate with a better BSS.

Here’s a screenshot for a MacBook Air which wouldn’t roam away from a BSS whose RSSI has fallen to -80 dBm, while the laptop was only able to transmit at MCS 0, 7 Mbps. However there was another BSS in the -60’s which would have allowed for much better Wi-Fi performance.

macos_wifi
Why is the native macOS Wi-Fi menu showing a full signal with -80 dBm RSSI and MCS 0? Wi-Fi Signal tells the real story.

In 2016 Apple published a webpage that explains how macOS makes roaming decisions and what roaming features it supports. This is very helpful and I wish other manufacturers would do the same. The algorithms that control client roaming are usually a black box, so Wi-Fi engineers have make a lot of assumptions about them when designing WLAN’s for clients that require efficient roaming. That said, while Apple says Macs should usually roam at -75 dBm, that doesn’t match my experience. Sometimes Macs are just sticky.

One reason for this is that once the roaming threshold is crossed, a Mac will only roam to a BSS that is 12 dBm louder than the current BSS, which would require a roaming candidate BSS to have an RSSI of -63 dBm or better before roaming will occur at -75 dBm. There doesn’t appear to be any way to modify this value.

Enabling 802.11k or 802.11v won’t help because macOS does not yet support those features, although they don’t prevent Macs from using an SSID that has them enabled. 802.11k and .11v are supported in Windows 10, however, if the wireless adapter supports those features.

There is an old plist that once controlled “opportunistic” roaming behavior, which I suspect meant roaming above -75 dBm RSSI.

/Library/Preferences/com.apple.airport.opproam

…which has these defaults in macOS 10.12 Sierra:

{
    deltaRSSI = 10;
    disabled = 0;
    useBonjour = 0;
    useBroadcastBSSID = 1;
}

That looks promising, however, this plist hasn’t been used by macOS since macOS 10.10 Yosemite. It’s ignored by the OS now, and when it was utilized, it wasn’t intended to be user-editable, so changes were likely to be overwritten by the OS.

So if you are an enterprise with a fleet of Macs to manage and you run into sticky client issues, consider infrastructure features like Cisco’s Optimized Roaming or Aruba ClientMatch to force better roaming behavior among these clients.

To observe roaming behavior on a Mac, I recommend WiFi Signal from Adrian Granados. It can be setup to generate macOS notifications when roaming events occur or the RSSI of the AP drops below a certain threshold.

wifi-signal-notifications

Categories
Analysis WLAN

Splunking Wi-Fi DFS Events

splunk-logo

One aspect of wireless networking that I’ve always struggled with is visibility into DFS events. Usually I catch them by chance by noticing two nearby AP’s on a site map using the same non-DFS channel, or maybe by casually looking through logs, but I’ve never felt like I had the reporting and alerting that should be in place for DFS events, because they can be very disruptive. An AP will abruptly change the channel it is operating on, and if it switches back, it may observe a “quiet period” of 60 seconds in which is does not transmit any data. Not good.

Enter Splunk.

Splunk is a powerful log analysis tool that you can think of as “Google for the data center.” It takes log data from almost any source and makes it as searchable as Google has made the web. For wireless network engineers, you can quickly and easily search syslog and SNMP data, build reports, and create alerts. Splunk Light is free and will process up to 5 GB of data a day, which should be plenty for most WLAN’s. It also runs easily on macOS if you just want to demo it locally.

Using Splunk I very quickly created this dashboard of real DFS data from SNMP traps coming from a Cisco WLC. It’s a little rough around the edges still (I need to figure out how to clean-up those AP names and channels), but it still shows me a lot of the valuable data.

splunk-dfs-report
Yes, DFS is a problem at this site.

I can easily create email alerts too, so that if a DFS event occurs an email is triggered, or if say 10 DFS events occur within 30 minutes an email is triggered.

How To

I installed Splunk on a Mac then setup the built-in snmptrapd to listen for incoming traps and log them to a file. For snmptrapd to interpret the SNMP traps from a Cisco WLC, download the Cisco MIB’s and copy them to /usr/share/snmp/mibs/. Then you can start snmptrapd.

Here’s the CLI one-liner to do that:

sudo snmptrapd -Lf /var/log/snmp-traps --disableAuthorization=yes -m +ALL

Next configure the WLC to send SNMP traps to the Splunk box by adding its IP address under Management -> SNMP -> Trap Receivers. While you’re there go to Trap Controls and turn everything on you want to analyze.

wlc-snmp

Even though DFS events only generate SNMP traps, it’s still a good idea to send syslog messages to Splunk too, so do that under Management -> Logs -> Config. Set the Syslog Level to “Informational” to get a lot of good data. “Debugging” is probably way too much. The Syslog Facility isn’t important.

wlc-syslog

Monitor the file snmptrapd is writing traps to to make sure it is working. Run this command on the Mac and you should see traps streaming in. If not you have some troubleshooting to do.

tail -f /var/log/snmp-traps

Now add the file to Splunk under Data -> Data inputs -> Files & directories, and you should be able to see the traps in searches.

Have a look at Splunk’s documentation on SNMP data for more setup help. Setting up syslog is easier. Under Data -> Data inputs -> UDP add UDP port 514 with the Source type “syslog.”

Once the data is coming into Splunk you can start searching it and creating fields. Search “RADIO_RC_DFS” (with quotes) to see all the DFS traps. From that search click “Extract new fields” and select the tab delimiter to parse the data. Give the AP name field a label, and then you can create visualizations of DFS events by AP name. Any search can also be used to trigger an alert, such as an email.

Cisco has published a WLC SNMP Trap Guide as well as a WLC syslog Guide that is helpful when working with this data. Find the messages you are looking for in those guides, then search for them in Splunk.

From there it’s all up to your own creativity. DFS events is just scratching the surface of Splunk’s potential. You can look at authentication events, monitor RRM, and there might be some interesting roaming analysis that can be done with this data as well. I’m sure there are some bright engineers out there that have taken this a lot farther. Please share your work!

Categories
Security Uncategorized WLAN

Use Let’s Encrypt Certificates with FreeRADIUS

lets_encrypt

Let’s Encrypt is a certificate authority that generates TLS certificates automatically, and for free. It’s been great for web server administrators because it allows them to automate the process of requesting, receiving, installing, and renewing TLS certificates, taking the administrative overhead out of setting up a secure website. And did I mention it’s free and supported by all the major web browsers now?

Getting all of that to work with a RADIUS server is challenging however, mostly because of the way Let’s Encrypt works. The Let’s Encrypt client runs on a web server with a public domain name. The client requests a TLS cert from Let’s Encrypt and before Let’s Encrypt issues the cert, it verifies that the client is connecting from the same domain name that it is requesting a cert for, and that the client can put some hidden files on the server’s website. Do you see the problem? Unless you run a public-facing web server on your RADIUS server (unlikely), Let’s Encrypt will not issue certs to your server. It needs a web server it can interact with in order to validate the domain name of the client’s request.

Why use a certificate from a public CA like Let’s Encrypt for 802.1X/PEAP authentication? While a private CA offers more security, a public CA has the advantage of having a pre-installed root certificate on virtually all RADIUS supplicants, including BYOD clients that are unmanaged. If you don’t have an MDM or BYOD onboarding solution, you can’t get your private root cert onto BYOD clients very easily.

Unmanaged clients are a security risk, however, because the end-user can easily override security warnings that occur when connecting to an evil twin network with a bogus cert. A good MDM solution will allow network admins configure BYOD clients properly so that TLS failures cannot be bypassed.

A few considerations before you get too excited:

  • Again, a better, more secure solution is to use a private CA and distribute the RADIUS server cert to clients using an MDM solution and/or BYOD onboarding solution.
  • Let’s Encrypt certs are only good for three months at a time, and some supplicants will prompt users to accept the new certificate when it is renewed.
  • Build in some error handling, logging, and notification. E.g. an email from the web server when the cert renewal routine runs, including its output, and an email from the RADIUS server when it copies the new certs and reloads FreeRADIUS.
  • It works as root, but there’s probably a way to accomplish this without using root. Do it that way.
  • You can accomplish the same thing with Windows servers and Powershell.
  • You broke it, not me.

To get this working, we need a public web server with the same domain same as you’d use in your RADIUS server’s cert common name. This means internal domain names with a .local TLD won’t work.

I setup two Ubuntu servers, one running the nginx web server with a public IP, and another on my local network running FreeRADIUS. The web server will run the Let’s Encrypt client and create and renew the certs. The RADIUS server will copy those certs from the web server and use them for PEAP authentication. Once setup, the process of renewing and installing the certs on the RADIUS server happens automatically, just like it would on a web server.

First, a public DNS A record needs to be setup with the domain name which will be used on the TLS cert common name, we’ll use radius1.example.com, and point it to the IP address of the web server.

Once that is done, you can install and run the Let’s Encrypt client on the web server. It works with Apache too, but if you prefer nginx like me, follow these directions to get it setup with Ubuntu 14.04 or Ubuntu 16.04. Don’t skip over the part about using cron to run the renewal routine.

Now that we have the certs on the web server, we’ll turn our attention to the RADIUS server. The first thing we need to do is setup ssh public key authentication between the two servers. I used the root account on both servers to do this, so that I would have permissions everywhere I needed it. With public key authentication in place securely copying the certs in the future can happen automatically, without getting stopped by a password request. Here are instructions to get that working.

Now we’ll start configuring FreeRADIUS on the RADIUS server. I’m assuming you already have a working FreeRADIUS server. I’m using FreeRADIUS 3, and you should be too. I like to use a separate directory for the Let’s Encrypt certs.

root@freeradius:~# mkdir /etc/freeradius/certs/letsencrypt/

Now let’s try copying the certs from the web server to this directory on the RADIUS server. If public key authentication is working, you should not be prompted for a password.

root@freeradius:~# scp root@radius1.example.com:/etc/letsencrypt/live/radius1.example.com/fullchain.pem /etc/freeradius/certs/letsencrypt/
root@freeradius:~# scp root@radius1.example.com:/etc/letsencrypt/live/radius1.example.com/privkey.pem /etc/freeradius/certs/letsencrypt/

Did it work? If so, you should see the certs in the new folder we created.

root@freeradius:~# ls /etc/freeradius/certs/letsencrypt/
fullchain.pem  privkey.pem

Now we need to configure FreeRADIUS to use the Let’s Encrypt certs for PEAP authentication. I have a previous blog about using different CA’s for PEAP and EAP-TLS on FreeRADIUS that should come in handy here. If you are using EAP-TLS too, be sure not to change that CA from your private CA! All we need to do now is modify /etc/freeradius/mods-enabled/eap with our new certs in the TLS section used for PEAP.

root@freeradius:~# nano /etc/freeradius/mods-enabled/eap

tls-config tls-peap should be changed to:

…
tls-config tls-peap {
 private_key_file = ${certdir}/letsencrypt/privkey.pem
 certificate_file = ${certdir}/letsencrypt/fullchain.pem
…

If you aren’t using multiple TLS configurations, this section is named tls-config tls-common. You can leave it like that.

Reload FreeRADIUS for the change to take effect.

root@freeradius:~# service freeradius reload
 * Checking FreeRADIUS daemon configuration...               [ OK ] 
 * FreeRADIUS daemon is running
 * Reloading FreeRADIUS daemon freeradius                    [ OK ]

Now when connecting to the WLAN that is configured to use this RADIUS server for 802.1X/PEAP  authentication, the client is presented with a valid Let’s Encrypt server certificate.

mac_cert_challenge

OK, we have a working FreeRADIUS server using Let’s Encrypt certs for 802.1X/PEAP authentication. Now let’s automate the process of getting renewed certs from the web server to the RADIUS server. We’ll use scp and cron to get this done.

On the RADIUS server, add these commands to root’s crontab, with the appropriate domain names.

root@freeradius:~# crontab -e
# m h dom mon dow command
0 3 * * 1 scp root@radius1.example.com:/etc/letsencrypt/live/radius1.example.com/fullchain.pem /etc/freeradius/certs/letsencrypt/
0 3 * * 1 scp root@radius1.example.com:/etc/letsencrypt/live/radius1.example.com/privkey.pem /etc/freeradius/certs/letsencrypt/
5 3 * * 1 service freeradius reload

At 3:00 AM every Monday, cron will run copy the TLS certs from the web server the reload FreeRADIUS at 3:05 AM to put them into production. Now the Let’s Encrypt certs are automatically installed on the RADIUS server a few minutes after they are renewed on the web server. The certs are good for three months at a time and renewable one month in advance, so you’ll get renewed certs automatically installed every two months.

Presto! You now have Let’s Encrypt certs automatically renewed and installed on your RADIUS server. While a private CA is a better solution for 802.1X authentication, this isn’t bad for a $0 software stack.

Categories
edtech HD RRM Security Uncategorized WLAN

Clear To Send Podcast Episode 62: K12 Wi-Fi Deployments

podcast_logoI recently had the pleasure of joining Rowell Dionicio on the Clear to Send Podcast to talk about Wi-Fi in K12 schools. Clear To Send is a great podcast about enterprise wireless networking and a great way to stay current with the Wi-Fi community.

We talked about K12 requirements, challenges, funding, my design process, security, and everyone’s favorite K12 subject, 1 AP per classroom!

After listening to the podcast, I thought about some other K12 Wi-Fi considerations that I didn’t bring up on the air.

  • K12 often has requirements for mDNS applications like Apple AirPlay for AppleTV or Google Cast for Chromecast. This is a challenge in an enterprise network because mDNS does not cross layer 2 boundaries. It’s important to consider that when designing a new WLAN and selecting the vendor. Many WLAN vendors do have features that can assist with relaying mDNS traffic between vlans. Be careful to limit this traffic to only the vlans where it is required.
  • Excessive multicast traffic can be a burden on channel utilization when it is not controlled. Many WLAN vendors have features that intelligently filter broadcast/multicast traffic, instead of always forwarding it out the AP radio interfaces at the lowest data rate. If you are dealing with mDNS or large subnets (common in K12) it’s worthwhile to understand how the WLAN can manage broadcast/multicast traffic.
  • MSP’s are a great way to get well-designed enterprise Wi-Fi into small to medium size schools that don’t have the internal resources to handle it themselves. MSP’s can be hired to support and operate the WLAN after installing it, which gives them an incentive that VAR’s who just sell the hardware might not have–to design the WLAN properly. E-Rate funding is now available to reimburse schools for managed services contracts with MSP’s.
  • eduroam is available for K12 schools, not just higher education. Check it out!
  • It’s hard to listen to the sound of your own voice.

I really enjoyed talking Wi-Fi with Rowell and I’d love to return to the podcast in the future. Maybe we can talk about healthcare Wi-Fi next? Thanks Rowell!

Have a listen here: CTS 062: K12 Wi-Fi Deployments – Clear To Send