Ask HN: What's up with the "model overloaded" on Gemini API?

This is unreal. Most of my requests using Google's `genai` are errored with 503. Someone experiencing the same?

3 points | by worldsavior 1 day ago

1 comments

nivafy 1 day ago
That “The model is overloaded” message is literally Gemini returning HTTP 503 (status: UNAVAILABLE) because the backend serving that specific model is out of capacity (or temporarily unhealthy) at that moment — even if your RPM/TPM quota is fine. Google folks / product experts have acknowledged this can happen during high demand and with higher-latency models (notably some 2.5 variants).