Ramblings of an aging IT geek
← Ramblings of an aging IT geek
hardware

a cheap logic analyser and the moment the bus stopped lying to me

Using a cheap clone logic analyser with sigrok to debug a flaky I2C sensor that the datasheet swore was fine.

A soldering iron and electronics on a workbench

For years I debugged hardware by squinting. I'd poke a multimeter at a pin, see roughly the voltage I expected, nod sagely, and then spend the next two hours rewriting perfectly good firmware because the actual problem was on a wire I couldn't see. This week I finally bought a logic analyser, the cheapest one that money can embarrass you into, and the first thing it did was prove that I had been wrong about absolutely everything.

the problem that wouldn't sit still

The project is mundane: an ESP8266 reading a BME280 temperature and pressure sensor over I2C, publishing to MQTT for the homelab. Worked on the bench. Worked for a day. Then every so often a reading would come back as nonsense, or the sensor would stop answering entirely until a power cycle. Intermittent. The word that ruins weekends.

The datasheet was no help, in the way datasheets never are when you've actually read them. The sensor should respond at address 0x76. The pull-ups should be fine. The bus should run happily at 100kHz. Every "should" was doing a lot of work and none of it was load-bearing.

So I did what you do, which is the wrong things in order. I swapped the sensor. Same fault. I rewrote the I2C init. Same fault. I added delays, because adding delays is the homeopathy of embedded debugging and I am not above it. The fault got rarer, which felt like progress and was in fact a clue I ignored.

the tool

The thing I bought is one of those little 8-channel clones, the FX2-based ones that everybody has, the ones that turn up for less than the cost of lunch. It is not a precision instrument. It samples fast enough for a slow I2C bus and that is the entire job.

The software is the actual product. sigrok and its frontend Pulseview do the work, and crucially they have protocol decoders, so you don't stare at a wall of high and low edges trying to do start-conditions in your head. You clip onto SDA and SCL, set the I2C decoder, and it annotates the capture with addresses, read/write bits, ACKs and NAKs, in human-readable text on top of the waveform.

A close-up of a circuit board

Setup was less painful than I feared. On Linux you want the right udev rules so you're not running it as root, and the firmware loads on the fly:

# /etc/udev/rules.d/60-libsigrok.rules already ships with the package
sudo usermod -aG plugdev $USER
pulseview &

Pick the device, pick a sample rate comfortably above your bus clock, set a trigger on SCL falling, and capture.

what the bus was actually doing

Here is the moment the tool paid for itself in the first ten minutes.

When the sensor worked, the decode was clean: start, 0x76 write, ACK, register, ACK, restart, read, data, data, stop. Textbook. Lovely. The datasheet, vindicated.

When it failed, the capture told a completely different story to the one in my head. The address byte went out fine. And then on the ACK bit, instead of the sensor pulling SDA low to acknowledge, the line just... drifted. Slowly. A lazy, rounded climb back towards high, nothing like the crisp edges everywhere else. The master read that mush as a NAK, gave up, and reported nothing.

I had been blaming the firmware and the sensor. The bus was the problem, and specifically the rise time on SDA. Those rounded edges are the signature of pull-up resistors that are too weak for the capacitance on the line. My wiring was a breadboard with longish jumper leads, which adds capacitance, and I'd fitted 10k pull-ups because 10k is the number you reach for without thinking. At the slow end and short wires, 10k is fine. With my layout it wasn't, and the failures clustered exactly when the chip was warm and the timing margins were already tight, which is why heat made it worse and why my added delays "helped". The delays gave the lazy edge slightly longer to flop over the threshold. Homeopathy, occasionally, accidentally lands on a real mechanism.

Another view of a populated circuit board

the fix, and the real lesson

The fix was a soldering iron and two resistors. I dropped the pull-ups to 2.2k, which yanks the line high far more aggressively and gives clean edges even with my untidy wiring. Recaptured. Crisp ACKs every time, warm or cold. The fault has not come back, and I have run it for two days now specifically to give it the chance.

The firmware I'd rewritten three times was innocent throughout. The sensor I'd swapped was innocent. I'd spent the better part of two evenings interrogating the parts of the system I could see in software because those were the parts I had tools for, while the actual fault lived one layer down on a physical wire I had no way to observe.

That's the thing about a logic analyser, and it's why I'm slightly annoyed at myself for not buying one years ago. It's not that it's clever. It's that it lets you see the layer you were previously just guessing about. A multimeter tells you the average. A scope tells you the shape. A logic analyser tells you the meaning, decoded, in words, with the ACKs and NAKs labelled. For thirty quid it has retired an entire category of "must be a firmware bug" wild goose chase.

The bus had been telling the truth the whole time. I just hadn't been able to hear it.