Ramblings of an aging IT geek
← Ramblings of an aging IT geek
ai

the spare gpu earns its keep

Getting a language model running locally on an old GTX 1080 over the Christmas break, and what 8GB of VRAM actually buys you.

A small robot figure against a dark background

There is a GTX 1080 in the bottom of a drawer that I have been meaning to do something with for about two years. Boxing Day, nothing on, family asleep, so I dug it out and put it in the spare desktop. The plan was modest: stop sending every idle curiosity to a web API and see what I could run on hardware I already own.

The answer, with 8GB of VRAM, is "more than I expected, less than the hype". You are not running anything enormous. But the smaller models that came out this year are genuinely useful, and they run on a card from 2016.

Getting the card to do anything

The first hour was the usual GPU pantomime. The 1080 is old enough that the drivers are stable and well understood, which is the one mercy here. On Ubuntu the run was roughly:

sudo apt install nvidia-driver-525
nvidia-smi   # confirm the card shows up

nvidia-smi reporting the card and its 8192MiB is the whole game. If that works, the rest is Python dependency archaeology, which is its own special kind of misery but at least a familiar one. I set up a fresh virtualenv per project because mixing CUDA-pinned torch builds across projects is how you lose an afternoon.

A close-up of a circuit board

What 8GB actually buys you

VRAM is the constraint that matters. A model has to fit, and once it does not fit, performance does not degrade gracefully, it falls off a cliff into system RAM and the whole thing crawls. So the trick is staying within budget.

The smaller models load fine and respond at a readable pace. Not instant, but you can have a conversation without checking your phone between tokens. Where it gets interesting is the things that are not chat: a local model summarising my own notes, never touching the internet, is quietly excellent. No rate limits, no "as an AI", no bill at the end of the month.

The quality is not what you get from the big hosted models, and I am not going to pretend otherwise. Ask it something it does not know and it will confidently invent an answer, same as the big ones, just with less polish on top. But for grunt work over text I already have, where I can check the output, it is fine. More than fine.

The genuine surprise was how little the card heated up. I expected the fans to spin up like a hairdryer. Mostly it sat there sipping power and getting on with it. A 2016 card, a drawer, and an afternoon, and now I have a language model that runs entirely on my own desk and owes nobody anything. That feels like the right trade.