I have written before about my little Pi cluster and how it taught me nothing useful. I stand by that. But "nothing useful" and "not worth doing" are different claims, and over the last few weeks I have done the thing properly, with provisioning and automation and a tidy enclosure, and I want to give it the longer write-up it deserves. Partly because the journey from "four boards on a tea towel" to "four boards I can rebuild from scratch in ten minutes" was the actually instructive part. The cluster does almost nothing. Building it well taught me plenty.
the hardware, and the lies it tells
Four Raspberry Pi 3 Model B boards. A cheap eight-port unmanaged gigabit switch, which is a polite fiction because each Pi's ethernet hangs off the internal USB 2.0 bus and tops out around 100Mbit no matter what switch you give it. A stack of acrylic plates with brass standoffs that I assembled twice because I did the bottom layer upside down the first time. And power, which is the part everyone underestimates.
I started with a multi-port USB charger rated for "2.4A per port". Under real load, with all four boards busy and SD cards being hammered, two of those ports sagged enough to trigger the under-voltage warning. The symptoms were maddening: a board that worked fine alone but corrupted its SD card under load, an apparent network fault that was actually a brownout, a node that rebooted only when its neighbours were busy. I chased all of these as software problems before I owned a USB power meter and saw the truth. The fix was a beefier supply and thicker, shorter cables. I cannot stress this enough for anyone building one of these: solve power first, then believe your other diagnostics.
provisioning, or how to stop hating your SD cards
The first time I built the cluster I flashed each card by hand, booted each board, SSH'd in, and configured it like an artisan. This is fine for one board and unbearable for four, and it means a corrupted card is an afternoon, not a minute. So the real work of this round was making the cluster reproducible.
The approach was boring and correct. Flash a base Raspbian image, drop an empty ssh file and a wpa_supplicant.conf onto the boot partition so the board comes up headless and reachable with no monitor or keyboard, give each a static lease from my DHCP server keyed on its MAC, and then hand the whole fleet to Ansible.
- hosts: cluster
become: true
tasks:
- name: set a sensible hostname
hostname:
name: "{{ inventory_hostname }}"
- name: install the basics
apt:
name: [docker.io, htop, vim]
update_cache: true
- name: add my key
authorized_key:
user: pi
key: "{{ lookup('file', '~/.ssh/id_ed25519.pub') }}"
That is the whole point of the exercise, really. The inventory is four lines, the playbook is a couple of dozen, and a dead node goes from "an evening of fiddling" to "reflash, run the playbook, done". The first time I deliberately corrupted a card to test this and had the node back in the cluster in under ten minutes, I felt more satisfaction than the cluster's actual workload has ever given me.
what i ran on it
Docker, and then a flirtation with Docker Swarm because a real Kubernetes was heavy for these boards in mid-2018 and k3s was not yet a thing I could lean on. Swarm was honestly fine for the toy workloads I had: a couple of replicated web services, a small distributed build cache, a Prometheus node-exporter on each board so I could watch them all on a Grafana dashboard. The dashboard was the most-used thing I built, which tells you something about the ratio of "watching the cluster" to "using the cluster".
The workloads themselves were never the point and were never fast. A 1.2GHz quad-core ARM with a gigabyte of RAM and 100Mbit networking is not where you go for throughput. But as a place to make distributed-systems concepts physical, it earned its keep.
the lessons that did stick
A few things genuinely transferred to my day job, which surprised me.
- Power and I/O are the real constraints on small systems, not CPU. I now reach for a power meter early instead of blaming the network.
- Reproducible provisioning is worth building before you need it. The discipline of "I can rebuild any node from nothing" is the same discipline that makes production servers cattle and not pets, and it is much cheaper to learn on a four-pound board than on a server someone is paying for.
- Watching a scheduler reschedule work because you physically unplugged a node is a better teacher than any diagram. There is no substitute for the moment the dashboard goes red and then, a few seconds later, settles.
was it worth it
For getting work done, no, and I will keep saying so. My laptop outruns the whole stack at everything. But the cluster was never an efficiency play. It was a way to turn abstractions I half-understood into hardware I could touch, fail, and rebuild. The Ansible playbook that came out of it is genuinely good, the power lesson saved me real time later, and the four green lights blinking in the cupboard make me happy every time I open the door.
If you are tempted to build one, a word of advice on managing expectations, because I see a lot of people buy in expecting a cheap server and leave disappointed. Decide up front whether you want a learning toy or a useful machine, because the Pi is genuinely good at the former and genuinely poor at the latter. If you want a small always-on server for a couple of services at home, buy one slightly better single board, or an old thin client off the auction sites, and you will get more done for less money and less fuss. The cluster only makes sense if the point is the cluster: the wiring, the provisioning, the watching things fail and recover. The moment you find yourself wishing it were faster, you have bought the wrong thing for the wrong reason, and that is on you, not the Pi.
So I will give it the summary it has earned. It is a beautifully self-justifying object: a machine whose entire value is in the building and the tending, not the running. I learned more about power budgets, reproducible provisioning and the physical reality of distributed systems than any cloud account would ever have taught me, precisely because I could hold the failures in my hand. The lights blink. The fans, such as they are, whir. And I am quietly, pointlessly delighted by it.
It taught me nothing useful and quite a lot of value, which is the most honest summary I can give. I would build it again tomorrow.