Tuesday, April 30, 2013

A FAQ about today's FSF release

I've had a few people ask me some questions. There's also been a few questions on slashdot. I'll update this article as more questions come in.

Was it all me?

No, I didn't do the bulk of the work. Luis did the bulk of the legal hoop-jumping and review process at work. I grabbed it near the end of this process (so he could move onto other things) and shepherded the process of getting things ready for open sourcing.

I encouraged some external developers from the community to come on board and help in the initial effort to get it to compile and work correctly under the open source Tensilica toolchain rather than the internal toolchain.

I've fixed a few bugs here and there - eg the RX path TSF bugs that stopped the NICs from working in Mesh mode, along with some other fallout issues from the toolchain migration.

I wanted the bulk of the work to come from the community rather than me. I don't want to be the only person working on this. Thankfully I'm not! There's an active community now!

I'll likely do a bunch more development in the firmware code once I get it working on FreeBSD!

Why is it only one device? Why is it so expensive? Why that device?

You'll have to ask the FSF that.

How different is this to the non-USB stuff?

Like a lot of manufacturers, Atheros reuses its CPU and Wifi cores everywhere they can.

The AR7010 designs have an external AR9280 or AR9287 NIC. This is exactly the same as a mini-PCIe design - the same chip, speaking PCIe, etc.

The AR9271 design is a single chip solution (see below) with an AR9285 NIC internally. I don't know whether internally it speaks PCIe or whether they just glued the NIC onto the AHB like they do for other integrated CPU+SoC designs (eg the AR913x, AR933x, AR934x, etc.)

But once you get past the USB and CPU parts, it looks exactly the same as the PCIe devices Atheros driver developers know and love.

Just keep in mind the main difference - the wifi part doesn't DMA directly to/from your computer memory. It has to go via buffer RAM on the AR7010 core in order to then send or receive it via USB endpoints.

What about the other NICs? The AR7010 based ones?

The AR7010 based ones are precursors to the single-chip solution that the FSF is selling a NIC for (the AR9271.) The AR7010 has USB on one side and PCIe on the other. It runs effectively the same firmware as the AR9271 NIC, save for some different ROM addresses, memory map and some other little differences.

The AR7010 based devices are thus "just as free" as the AR9271 NIC the FSF is selling.

I'm not sure if the FSF is going to certify an AR7010 design. I hope they can find a dual-band AR7010+AR9280 ath9k_htc NIC and sell that as part of their open hardware programme.

What is this AR9271 anyway? Why is it only 1x1 and 2GHz only?

The AR9271 is a single chip solution containing:
  • An AR7010 style core, with minor differences
  • Some RAM and ROM (but less RAM than the AR7010; no I don't know why.)
  • An AR9285 derivative (which is the 2GHz, 1x1 chip.)
Like a lot of things that manufacturers do, it's a "cost savings" design for a specific market. Even now, laptop and tablet manufacturers want to skimp on 5GHz NIC designs in order to save some cash. No, I don't know why. No, I can't quote costs.

How can I help?

Download the firmware, download a linux-next or compat-wireless tarball - or, run OpenBSD + athn for now - compile stuff up and hack away.


Hi to those from!


All I'd like to say is:

"Patches gratefully accepted."

Thursday, April 25, 2013

Today's Journey: Making AP mode power-save work better

I've been working on improving the net80211 and ath driver support for AP mode power save.

There's a few parts to it:

  • A station can tell an access point it's going to sleep by setting the power mgmt bit to 1 in a TXed frame;
  • The AP will then update the TIM entry in the beacon frames it sends out to reflect whether that station has any traffic queued;
  • A station can signal an AP that it's awake by sending a data frame with the power mgmt bit set to 0;
  • .. or it can request a frame at a time by using PS-POLL;
  • There's also the uAPSD stuff which I haven't yet implemented and won't likely do so for a while.
Now, it shouldn't be that difficult. Except, that it is.

If an AP has a bunch of frames queued to a station that has gone to sleep, it will keep trying to transmit those frames. That wastes air-time and results in annoying levels of packet loss.

When you're doing 802.11n, there's a whole lot more traffic going on and a lot more room to cause massive traffic issues if you drop frames. But you don't want to keep failing to transmit those frames or you'll end up spending a lot of time transmitting BAR frames to the station.

If the driver maintains a queue of frames (for say, software retransmit) then it also needs to ensure that the TIM bit is set correctly. Otherwise the AP may set the TIM bit to 0 because the net80211 stack has no queued frames to that node; but the driver itself has some frames. Thus, the station won't wake up and you'll see increased packet latency.

When PS-POLL is received, frames need to first be leaked from the driver queue BEFORE it starts leaking frames from the net80211 power save queue. The last thing you want is the wrong set of frames to go out.

So, I've spent the last few months extending the driver and network stack to make this feasible. There's new net80211 driver methods for tying into the TIM update process, the node power save status and the PS-POLL handling. The filtered frames handling in the ath driver is another precursor to this - it means that frames can be failed out very quickly and retried when appropriate.

(No, I'm not implementing software retransmit for non-11n traffic just yet. I will eventually. Just not yet.)

The final bits that I've been working on have been tricky.

When a node goes to sleep, you want to pause the driver transmission to the node - otherwise it will keep trying to transmit whatever is in the driver queue. For 11n this is terrible; it means that frames will keep failing to be transmitted and with enough failures, the traffic will stop whilst a BAR frame is sent. Grr.

Next was figuring out how to send frames whilst the node is "paused". I introduced a per-node "leak" counter which tells the driver transmit path that even though the node is asleep, a single frame should be scheduled. If one isn't available, the next frame sent will be scheduled. This handles the PS-POLL "null" response - ie, if there's nothing in the queue, the net80211 stack will queue a null data response with the MORE bit clear. That way the station will know there's currently nothing to receive.

But then, something odd started happening. Devices would disassociate and re-associate, but they'd still be marked as "asleep". So no traffic would occur. After digging into it a bit, I discovered that the only time a station transitions back to awake is when it receives a DATA frame with the power mgmt bit set to 0. Seeing management/control traffic from the station isn't enough. So for now, I just always transmit management/control frames regardless if the station is asleep or awake - except BAR frames. Those get software queued if the node is asleep. Now that management/control frames are transmitted directly, a station can re-associate and be marked as 'awake.'

Then I found that once a station re-associates, it should have all of its current association state reset. It may have had a bunch of aggregate frames queued to the hardware and those need to finish transmitting before we can start transmitting new data to the re-associated station. It may even have been in the middle of receiving a BAR frame! So, I have to gently (well, "gently") reset the association state to allow for currently queued frames to be cleaned up, but reset things like filtered frame state and BAR TX. Ew, but it needs to be done.

Also, if there's data queued to an asleep station and a BAR frame needs to go out, the BAR frame needs to go into the head of the software queue, not the tail. Otherwise it will have to wait for the queue to be transmitted - which, if there's a gap in the transmit block-ack window (hence needing the BAR), no further transmission will occur. Oops!

I then found that a sufficiently chatty node could end up filling the software queue full of buffers destined to it. This is a general problem in the ath driver which I'll eventually fix, but it became a huge problem with power save enabled. So, I've introduced a per-node maximum queue depth when it's asleep. That should limit the amount of pain that a single sleeping node can cause. I'll eventually introduce a limit for how many buffers an individual node can consume whether it's awake or asleep but that's for another day.

There's likely lots more corner cases that need to be addressed before I can merge this into -HEAD. I'm still seeing my macbook pro occasionally disassociate and not automatically re-associate and I'm not sure why. But things are behaving much, much better with sleeping devices.