the logic analyser that finally let me see the bus

A soldering iron and electronics on a bench

For about three weeks an I2C temperature sensor on my bench refused to behave. Sometimes it answered, sometimes it didn't, and the only signal I had was a Python script printing OSError: [Errno 121] Remote I/O error at me with the regularity of a metronome. I'd been debugging the way I always debug hardware I can't see: change one thing, pray, repeat. It is a terrible way to spend an evening.

The thing that finally broke the cycle cost less than a takeaway. An 8-channel USB logic analyser, the small blue Cypress FX2 clones you can buy for a few quid, the ones that show up as a Saleae if you squint and lie to the driver. I'd had one in a drawer for ages and never used it properly. This week I did, and it changed how I feel about the whole class of problem.

the kit and the software

The hardware is almost beside the point. It samples up to eight digital lines and streams them over USB. The magic is on the host side, and the host side is sigrok with its PulseView frontend.

sudo apt install pulseview sigrok-cli

PulseView is one of those bits of open-source kit that quietly does a professional job for nothing. You wire your probes to SDA, SCL and a common ground, point it at the right device, set a sample rate comfortably above your bus clock, and hit run. For a 100kHz I2C bus I sampled at 1MHz, which is plenty. You want roughly ten samples per bit minimum or the edges get mushy.

A close-up of a populated circuit board

The first capture was a revelation. There it was: the start condition, the address byte, and then, where the slave should have pulled SDA low to acknowledge, nothing. NAK. The sensor simply wasn't answering its own address.

the protocol decoders are the whole point

A wall of square waves is mildly interesting. What makes the analyser genuinely useful is that sigrok ships protocol decoders. You stack an I2C decoder on top of the raw SDA and SCL traces and PulseView annotates the capture for you: start, address, read/write bit, the ack or nak, each data byte in hex, the stop. You read the conversation instead of inferring it.

So I could see, in plain annotation, that I was addressing 0x76 and the device on the board was at 0x77. The breakout had an address-select pad bridged that I'd never looked at, because why would you. Every datasheet example used 0x76, so that's what I'd typed, and the hardware had been quietly disagreeing with me for three weeks.

A two-minute fix once I could see it. The pad got a dab of solder, the address moved, and the script started printing temperatures like nothing had ever been wrong.

Worth dwelling on what the decoder showed me beyond the address, though, because the rest of the capture was just as educational. With the right address in place I could watch the full transaction: write the register pointer, repeated start, read two bytes back, stop. The annotation laid it out byte by byte, and for the first time I could check the timing against the datasheet's bus diagram and see that they matched. The sensor had a minimum bus-free time between transactions, and I could measure it directly off the capture by dropping a pair of markers and reading the delta. No more guessing whether my delays were generous enough; the analyser told me to the microsecond.

chasing an intermittent glitch

The address fix solved the headline fault, but there was a second, rarer one lurking underneath: maybe one read in fifty still came back as nonsense. The sort of bug that's almost worse than total failure, because it's just reliable enough to ignore and just frequent enough to ruin a dataset.

This is exactly the case where a live view is useless. You can't sit watching the screen waiting to catch a one-in-fifty event with your eyes. So I set a trigger. PulseView lets you arm a capture on an edge or a condition and only start recording when it fires. I couldn't trigger on "bad data" directly, but I could trigger on the thing I suspected: a glitch on SCL, a clock line that dipped when it shouldn't.

I left it armed and went to make tea. By the time I came back it had caught one. There, on the clock line, was a runt pulse: a brief, ragged dip that the sensor had clearly interpreted as an extra clock edge, shifting every subsequent bit by one and turning a clean reading into garbage. The cause turned out to be mundane, a jumper wire to SCL that was a touch too long and acting as a small antenna, picking up switching noise from a nearby buck converter. I shortened the lead, added a stronger pull-up, and the glitch never came back.

I would never have found that by changing one thing and praying. The fault was invisible at the software layer, where it only ever showed as the occasional bad number, and it was rare enough that the usual debugging reflex of "run it again and see" actively hid it. The analyser turned a statistical annoyance into a single screenshot with the cause sitting right there on the trace.

what I actually learned

The fix is not the interesting bit. The interesting bit is that I'd been treating a transparent, observable bus as a black box for no reason other than not owning the right tool, or rather owning it and not bothering. Digital buses are not magic. SPI, I2C, UART, even bit-banged nonsense you invented yourself: they are voltages changing over time, and a logic analyser turns time into something you can scroll through.

A few things worth knowing if you're starting from the same place I was:

Get the ground right. A single shared ground between the analyser and the board under test, kept short. Skip it and you'll capture noise that looks like data and lose another evening.
Sample fast enough but not absurdly fast. These cheap clones share USB bandwidth across channels, so going to the maximum rate on all eight lines will drop samples. Use what you need.
The decoders are stackable. Decode UART to bytes, then stack a higher-level decoder on top of that if one exists for your protocol. It's decoders all the way up.
Triggers save your sanity. If the fault is intermittent, set a trigger on a rising edge or a particular pattern and let it wait for the glitch rather than you staring at a live view hoping to catch it.

I'm slightly annoyed at how long I went without doing this. Years of poking at serial ports and SPI flash and guessing, when for the price of lunch I could have just looked. If you do any embedded work at all, even occasionally, get one of these and learn PulseView. The next time something on a wire misbehaves you'll stop arguing with it and start reading it, and that is a much better way to spend an evening.