Say what you will about Sam Altman, but at least he engages with his user base and acts on user feedback.
Dario and co seem to be on some elevated pedestal - us mere mortals are beneath them - and they have this scattershot devrel where each engineer has their own X way of communicating to the public often at odds with each other.
I can't see why anyone still chooses Claude. Codex outperforms it in most respects, and its quotas are about ten times larger. A $100 Codex plan gets me through the whole week with 6–12 hours of coding per day.
I found GPT 5.5 is pretty solid, but I keep getting impressed by opus. It's tracked down some insane stuff while I look away during a meeting. 5.5 is way closer than previous OpenAI models to Anthropic IMO.
These things are so tricky because everyone has a seemingly conflicting experience. Part of the fun I guess!
I've never actually run into the issues that people talk about online, like Claude suddenly getting dumb or running out of usage. So there's just not a lot of incentive for me to shop around. I've used Amp a bit, and it's quite nice, but a bit more expensive without the subsidized subscription.
It has always been like this. We actually know that the model performance has been mostly steady[0], but you cannot beat the notion of "evil companies secretly serving us worse models." The meme value is too strong.
I'm using Opus on xhigh 10+ hours a day, and I've only reached 80% of weekly limits when doing massive ports or refactors. I haven't once hit hourly limits, and I've used Claude very, very aggressively. I guess its a pain point for power users.
When do you use it the most? I’ve noticed that it most often starts to degrade during 10-5 US East coast time. Late at night, I have the least amount of issues, but without fail, if I’m trying to do anything complex during the day, Claude gets loopy.
One reason might be that Claude Opus 4.7 thinking benchmarks better on Arena Coding at https://arena.ai/leaderboard/text/coding ... hopefully that effectively assesses correctness. It doesn't account for reliability though.
In my org the teams doing agent engineering at scale are all on Codex using gpt-5.5. By scale I mean fully agent authored code workflows with long running / multi hour plans.
You get a discount for paying for a full year on Teams and Enterprise can involve contractual obligations. It's a lot of effort to get buy-in to change providers and to shift an entire organization. The winds change frequently in this space and the pain needs to get to a certain level before it's worth rolling the dice.
Based on the experience of people using the $20 Claude Pro subscription and exhausting their quotas in a manner of minutes, the answer to your question is probably "less". (I would guess that the $100 plan would do the trick.)
Corporate policies and agreements. In large corporations, using external non-approved models with proprietary source code is a good way to have significant career issues.
Wow, I'm really surprised. I tried deepseek (their best model, through the official API). Its extremely cheap, but its clearly not as good at programming as Opus 4.7. It seems nowhere near as good at making high level design choices. Deepseek also seems to get stuck in whack-a-mole fixing loops much more than opus. I stopped it at one point, and asked opus to solve the problem it was trying to solve and it saw the solution immediately.
I was running deepseek through claude's code agent harness. Maybe it works better through a different tool?
I've given V4 Pro some curly things and I was impressed at how it figured them out. I agree high level design is not its forte. But it sat in a loop and dogmatically debugged a crazy dependency issue to come to the right answer over the course of 15 minutes which impressed me.
interestingly I had the same experience, and weirdly it's in part because it is clearly less intelligent. It's more of a mechanistic tool just doing what I ask (but still very smart and very competent about it) and less trying to win a nobel prize with each answer. Turns out I actually like that.
You're assuming the elevated error rates are due to the system being overloaded. We have no evidence this is actually the case. Its much more likely due to a simple misconfiguration or failing router or something.
so, all those CEOs moving all those remaining engineers to be dependent on a cloud service to the extent that there's no local development capability are gonna appologize right
Dario and co seem to be on some elevated pedestal - us mere mortals are beneath them - and they have this scattershot devrel where each engineer has their own X way of communicating to the public often at odds with each other.
I loved Sonnet and Opus fwiw but not anymore.
These things are so tricky because everyone has a seemingly conflicting experience. Part of the fun I guess!
[0]: https://marginlab.ai/trackers/claude-code/
I use Codex when Claude Code is down, and I only began using Claude when ChatGPT was down
yes codex is very fast, I go back to Claude for now
Opus 4.7 + Rust is a killer combo.
Heck I prefer DeepSeek to both of those.
I was running deepseek through claude's code agent harness. Maybe it works better through a different tool?
That and the lack of image-read support surprised me. I'm a big fan of feeding screenshots into my llm and that killed it for me.
I would have been much more impressed with v4 about 6 months ago. But I've been spoiled by opus 4.7. Deepseek isn't at the same level.
My systems are hitting exponential delay retries, so this might not get better because retries overload things again.
> {'type': 'error', 'error': {'details': None, 'type': 'overloaded_error', 'message': 'Overloaded'}, 'request_id': 'req_ ...
I can see a weird spike in my cache hit-rate a few minutes before, so this might actually be some extra caching they have thrown in.