That “The model is overloaded” message is literally Gemini returning HTTP 503 (status: UNAVAILABLE) because the backend serving that specific model is out of capacity (or temporarily unhealthy) at that moment — even if your RPM/TPM quota is fine. Google folks / product experts have acknowledged this can happen during high demand and with higher-latency models (notably some 2.5 variants).
1 comments