Google announced Gemini at the start of December, and roughly everyone I follow has now had an opinion about it, which is itself the interesting thing. The model is fine. The reaction to the model is the actual story.
If you missed the cycle: Google put out Gemini as its answer to the obvious question of where it had been for the past year while a smaller lab ate its lunch. There were three sizes, the usual benchmark tables showing it edging ahead on some evaluations, and a demo video that was very, very good. And then within a couple of days it came out that the demo video was rather more produced than it looked. The slick real-time back-and-forth was, in fact, stitched together from stills and carefully chosen prompts, with the latency and the misfires edited out.
That's the bit everyone has an opinion on, and I want to pick it apart, because there are two complaints tangled together and only one of them is fair.
the unfair complaint
The unfair complaint is "the demo was faked, therefore the model is rubbish." It isn't. Doing the same task from still frames and typed prompts instead of a live video stream is a real capability, and a genuinely impressive one. The model can do the thing. It just can't do the thing at the speed and fluidity the video implied, in one take, with no help.
Demo reels have always been polished. Nobody ships the take where the presenter's laptop bluescreens. If your only objection is that a marketing video was edited, you haven't been paying attention to marketing videos.
the fair complaint
The fair complaint is about the gap, and specifically about who controls it.
When the gap between "what the reel shows" and "what the thing does" is small, an edited demo is harmless shorthand. When the gap is large, the edit isn't shorthand any more, it's the message. You are being sold the gap. And with these models the gap is the entire game, because the difference between "answers in 800 milliseconds, conversationally, watching live video" and "answers eventually, from stills, after some prompt-wrangling" is the difference between a product and a research result. Those are not the same thing, and the video deliberately blurred which one you were looking at.
That matters more here than with most products because nobody outside can easily check. I can't run the weights. I can't reproduce the latency. I can only watch the reel and read the benchmark table, and now I've learned that one of those two things was staged, so what do I do with the other one? The benchmark numbers were a close-run thing to begin with, the sort of margin where the choice of evaluation and the exact prompt formatting swings the result either way. Once you've been shown a polished video that didn't reflect reality, you start reading the careful footnotes on the benchmark tables much more closely. And you should.
what I actually take from it
A few honest reactions, having sat with it for a couple of weeks.
- The competition is good. A year ago there was effectively one serious general-purpose model the public could poke at. Now there are several, from labs with the resources to keep iterating. That's healthy, whatever you think of any individual launch.
- I've stopped trusting launch-day capability claims entirely, from anyone. Not because anyone is uniquely dishonest, but because the incentive to present the best possible framing is overwhelming and the audience's ability to verify is near zero. I wait for people I trust to actually use the thing on real work.
- The "we're behind and we know it" energy of this launch was more revealing than the benchmarks. You don't ship a heavily edited demo if you're comfortable. The edit told me more about the competitive pressure than the model told me about itself.
None of this is me dunking on Google. I'd quite like a strong second and third option in this space, and they have the talent and the data to provide one. But the pattern is now clear enough to name: in a field where almost no one outside can independently verify the claims, the demo is the product until proven otherwise, and "proven otherwise" takes weeks of other people doing real work with it.
So I'll do the boring thing. I'll wait, I'll watch what people build with it rather than what the launch video showed, and I'll keep reading the footnotes. The keynote everyone has an opinion on turned out to be most interesting for what it accidentally revealed about how little any of us can trust a keynote. Which is, I suppose, a useful thing to have learned twice.