You're 45 minutes into a complex refactoring session. The AI agent has already verified 12 file operations, updated your tests, and is about to commit. Then you see it:
Error 429: Rate limit exceeded. Please try again in 60 seconds.Relying on a single LLM provider for autonomous agents is a single point of failure. When Anthropic goes down or rate limits you, your entire workflow stops.
This is the single-provider problem, and it's why we built the Mythos Orchestration Engine.
The Architecture
The orchestrator sits between your CLI and the model providers. Instead of hardcoding a single API endpoint, it manages a ranked pool of providers with real-time health tracking.
Here's the provider registration:
orchestrator.registerProvider(anthropicProvider, { priority: 0 });
orchestrator.registerProvider(deepseekProvider, { priority: 1 });
orchestrator.registerProvider(openaiProvider, { priority: 2 });Priority 0 is your primary. The others are your safety net.
Layer 1: Adaptive Selection
Providers are scored using an Exponential Moving Average (EMA) of: - Success rate โ weighted heavily (ร100-150 depending on task type) - Latency โ penalized for slow responses - Cost per 1K tokens โ cheaper providers get a boost
By scoring providers in real-time, Mythos dynamically shifts traffic. If your primary model starts degrading in speed, a secondary model automatically takes over before a timeout ever occurs.
The highest-scoring healthy provider wins the request.
| Provider | Base Latency | Cost (1M Tokens) | Orchestrator Rank |
|---|---|---|---|
| Claude 3.5 Sonnet | 1.2s | $15.00 | Primary |
| DeepSeek-V3 | 0.8s | $1.20 | Secondary |
| GPT-4o | 1.5s | $10.00 | Tertiary |
Layer 2: Retry with Backoff
If the selected provider fails with a retryable error (like Anthropic's OverloadedError), the engine retries with exponential backoff: 100ms โ 500ms โ 1000ms.
This handles transient blips without wasting a fallback slot.
Layer 3: Circuit Breaker
If all retries are exhausted, the provider is marked as "degraded" and put on a 5-minute cooldown. The request automatically falls through to the next provider in the ranked list.
Primary (Claude) โ 429 โ Retry 1 โ Retry 2 โ Retry 3 โ TRIP BREAKER
โ
Fallback (DeepSeek) โ โ Connection established. Streaming.The stream continues. Your session doesn't break. Your context is preserved. You can read more about how this integrates with the broader system on the Mythos Engine Reference page.
Stream Watchdog
There's a subtler failure mode that most tools ignore: stream stalls. The connection is open, the API hasn't errored, but no tokens have arrived in 15 seconds.
The orchestrator runs an adaptive watchdog on every stream: - Default timeout: 15 seconds of silence - Adaptive: 3ร the provider's average latency (EMA-tracked) - If the watchdog fires, the stream is aborted and the request is re-routed
This catches the "API is technically alive but functionally dead" scenario.
Deterministic Mode
For operations where you need exactly the same provider every time (like when a Skill forces claude-3-5-sonnet), the orchestrator supports deterministic selection. If you want to dive deeper into the code, check out the mythos-router repository on GitHub.
The Cost
Adding fallback providers costs you nothing until they're used. You only pay for tokens when a fallback actually fires. In practice, fallbacks trigger less than 5% of the time โ but that 5% is the difference between a broken session and a saved one.
Setup
Add your fallback API keys to your environment:
export ANTHROPIC_API_KEY="sk-ant-..." # Primary
export DEEPSEEK_API_KEY="sk-..." # Fallback 1
export OPENAI_API_KEY="sk-..." # Fallback 2Then run mythos-router as usual:
npx mythos-router chatThe orchestrator detects available keys and registers providers automatically. If Claude goes down, DeepSeek picks up. If DeepSeek goes down, OpenAI picks up. Your work never stops.