I cannot open a single feed this month without GPT-4 in it. It landed in mid-March, and the discourse has not let up since: the demos, the exam scores, the breathless threads, the equally breathless backlash. Half my timeline thinks it is the end of software engineering and the other half thinks it is a stochastic parrot that will be forgotten by summer. I have spent the last few weeks actually using the thing for real work, and my view sits, predictably, in the dull middle.
What it is genuinely good at
It is very good at the stuff that is tedious but not hard. Boilerplate. Translating a config from one format to another. Writing the first draft of a regex I then immediately rewrite because I do not trust it. Explaining an unfamiliar error message in context. Taking a wall of someone else's undocumented code and giving me a plausible summary so I know where to start reading. These are real time savings and I am not going to pretend otherwise out of contrarianism.
The jump from the previous model is real too. The thing I notice most is not raw capability but that it stays coherent over longer, more tangled problems. You can hand it a messier prompt with more context and it does not lose the thread halfway through the way its predecessor did. For the kind of work where I am thinking out loud and want a competent rubber duck that occasionally talks back with something useful, it has genuinely changed my day-to-day.
What it is quietly bad at
It is confidently wrong, and that is the dangerous bit, because it is confidently wrong in exactly the same tone it is confidently right. It will invent a function that does not exist, a flag that was never added, an API that would be lovely if it were real. For anything I do not already know enough to check, I cannot trust it, which means its value is highest precisely where I need it least. That is an awkward shape for a tool.
me: does <library> have a built-in retry with backoff?
it: yes, use Client(retry=Retry(backoff=2.0))
me: *checks the docs* no it does not
It also flattens everything towards the median of its training data. Ask it for an opinion and you get the consensus opinion, sanded smooth. That is fine when you want the boring correct answer and quietly corrosive when you wanted to think.
The bit the discourse keeps missing
The interesting question is not whether it replaces engineers. It does not, not in any form I can see from here. The interesting question is which parts of the job were always tedium dressed up as skill, and how I feel about a tool that does them. Some of it I am happy to hand over. Some of it I am uneasy about handing over, because the tedium was also where I learned things, and a junior who never writes the boilerplate may never build the intuition that the boilerplate quietly taught.
So my honest position, against the grain of every feed I read today: it is a real tool, it is genuinely useful for a real slice of my work, it is dangerous exactly where it is most confident, and the hype and the backlash are both performances. I am going to keep using it for the boring parts, keep checking everything it tells me, and try very hard not to outsource the bits of the job that were actually the job. Ask me again in a year. The only thing I am sure of is that the people loudest about it today, in both directions, will be quietly wrong.