Default list prices
Below are the default list prices (in USD per Million Tokens) configured on the platform.| Model / Tier | Input Price (per M) | Output Price (per M) | Cache Read (per M) | Cache Write (per M) |
|---|---|---|---|---|
| Anthropic Claude | ||||
claude-sonnet-4-6 | $3.00 | $15.00 | $0.30 | $3.75 |
claude-sonnet-4-5-20250929 | $3.00 | $15.00 | $0.30 | $3.75 |
claude-haiku-4-5-20251001 | $1.00 | $5.00 | $0.10 | $1.25 |
claude-opus-4 / claude-opus-4-7 | $15.00 | $75.00 | $1.50 | $18.75 |
| OpenAI GPT | ||||
gpt-4o-2024-08-06 | $2.50 | $10.00 | $1.25 | $2.50 |
gpt-4.1 | $2.00 | $8.00 | $0.50 | $2.00 |
| Google Gemini | ||||
gemini-3.1-pro-preview | $1.25 | $10.00 | $0.3125 | $0.00 |
gemini-3-pro-preview | $1.25 | $10.00 | $0.3125 | $0.00 |
gemini-3.5-flash | $1.50 | $9.00 | $0.15 | $0.00 |
Default fallback rate
If a model name is not recognized or does not match any of the custom tiers above, the system logs a warning and falls back to the default list rate (matching the Claude Sonnet 4.6 tier). The Models breakdown table displays a warning indicator next to the model name to signal that the spend is an estimate.- Input: $3.00 per Million
- Output: $15.00 per Million
- Cache Read: $0.30 per Million
- Cache Write: $3.75 per Million
Prompt caching dynamics
To provide highly accurate cost accounting, prompt caching is calculated uniquely per provider:- Anthropic / OpenAI: Prompt cache write tokens are priced at a premium rate (
cache_write/cache_create), and subsequent hits are charged at a heavily discountedcache_readrate. - Google Gemini: Google structures prompt caching differently, billing cache storage per hour rather than a per-token write rate. As a result,
cache_write_tokensare priced at $0.00 (cache_create: 0.0), whilecache_read_tokensrepresent the discounted input rate. - Double-charging prevention: To prevent double-charging, the platform’s query layers automatically deduct any
cache_read_tokensfrom the billableinput_tokenscount on each LLM call before applying the pricing rates. Thus, the calculation always evaluates asbillable_input = max(input_tokens - cache_read_tokens, 0).