The complaint, as ever, was not phrased in technical terms. It was "the internet is broken again" and it arrived while I was in the middle of something. By the time I had wandered through to look, whoever had broken it had stopped doing whatever broke it, and everything was fine. This went on for weeks. I would test the line, get a clean result, and quietly suspect the complainant of exaggeration.
They were not exaggerating. The line was fine. The line was, in fact, the problem, in that I kept testing the line and the line was the wrong thing to test.
Bandwidth is not the bottleneck you think it is
We have a perfectly reasonable connection. On a good day it does about 70 down and 18 up, which on paper is plenty for a household. The trouble is the word "household". At any given evening there might be someone streaming a 4K film, someone else on a video call for work that has bled into the evening, a games console downloading a 90 GB update it decided it needed right now, and me, trying to ssh into a box and getting a terminal that types each character a noticeable beat after I press the key.
That last one is the tell. SSH is almost nothing in bandwidth terms. A few bytes a keystroke. If a few bytes a keystroke cannot get through cleanly whilst a download is running, the problem is not how much bandwidth I have. The problem is latency under load, and specifically bufferbloat: the download fills a fat buffer somewhere, usually in the modem or the ISP's kit, and every other packet has to queue behind it. You do not run out of road. You run out of road that is not already full of one enormous lorry.
The way to see it is to run a ping whilst saturating the link. Idle, my ping to a nearby host sits around 12 ms. Start a flat-out upload and, before I fixed any of this, that same ping climbed past 600 ms and sometimes well into the seconds. That is the whole story in two numbers. A video call does not survive a jump from 12 ms to 600 ms. It judders, the audio drops, and someone says "the internet is broken again".
fq_codel, then CAKE
The fix is queue management that is cleverer than first-in-first-out. The kernel has shipped good options for years now and I had simply never turned them on properly. The two that matter here are fq_codel and cake.
fq_codel does two useful things at once. It fair-queues flows, so my single ssh session is not stuck behind the console's download, each flow gets its turn. And it actively manages the queue with CoDel, dropping or marking packets when the queue starts to build, which signals the senders to back off before the buffer becomes a swamp. You give up a sliver of raw throughput and you get your latency back. That is a trade I will take every single evening.
cake (Common Applications Kept Enhanced) is the newer, more opinionated option, and it is what I have settled on. It rolls the fair-queuing and active management together with built-in bandwidth shaping, so you do not need a separate tc htb setup to police the rate. The key thing CAKE needs from you is honesty about your actual line speed. You set it slightly below your real throughput so that the queue lives in your router, where CAKE controls it, rather than in the ISP's buffer, where it does not.
On my gateway, running a stock Linux, the core of it is not much more than this:
# eth1 is the WAN-facing interface
tc qdisc replace dev eth1 root cake bandwidth 17Mbit \
nat dual-srchost ack-filter
# and for the download direction, via an IFB so we can shape ingress
tc qdisc replace dev ifb-eth1 root cake bandwidth 65Mbit \
nat dual-dsthost ingress
A few of those flags earn their place. nat tells CAKE to look through the NAT so it fair-queues by the real internal host, not by the single public address everything shares, otherwise the whole house counts as one flow and the fairness is meaningless. dual-srchost and dual-dsthost balance both between hosts and between flows within a host, so one machine opening fifty connections cannot drown out a machine with one. ack-filter thins out redundant TCP ACKs on the upload, which matters on an asymmetric line where a fast download's ACK stream can choke your modest upload.
The numbers, 17 and 65, are deliberately under the 18 and 70 the line can manage. That margin is the price of keeping the queue on my side of the modem. Set it too high and the ISP buffer fills again and CAKE never gets to do its job. Set it too low and you are throwing away bandwidth you paid for. I found mine by running the saturating-ping test and nudging the figure down until the latency under load stopped climbing.
What the numbers did next
Same test, after CAKE. Idle ping, 12 ms. Saturating upload running flat out, ping holds around 25 to 30 ms. Not 600. Not seconds. Low tens of milliseconds, whilst the link is full. The download still runs at very nearly full speed, because all I have given up is the few percent of headroom I shaped off the top.
The human result is more telling than the graph. The console can grind through its 90 GB update in the background and nobody on a call notices. I can ssh into a box mid-evening and the terminal echoes when I press the key, not a moment later. The 4K film does not buffer because its packets get their fair share rather than being stuck behind whatever else is loudest.
I did not classify anything by application. There is a whole rabbit hole of marking traffic by port and DSCP, giving "gaming" or "voice" some explicit priority, and for a while I assumed that was the work. It is mostly not. Fair queuing plus active queue management gets you the overwhelming majority of the benefit, because the real enemy was never that the wrong traffic had priority. It was that one greedy flow was allowed to fill a buffer and make everyone else wait. Stop the buffer filling and the fairness mostly sorts itself out.
If you take one thing from this: test latency under load, not idle speed. The speed test we all instinctively run measures the one condition that was never the problem. Saturate the link and watch the ping. If it climbs into the hundreds, no amount of extra bandwidth will fix the feeling of a broken connection, and one qdisc will.