Skip to main content
The Guild platform computes estimated LLM spend in real-time based on industry-standard per-million-token list prices. All calculations are handled linearly and accurately at the individual LLM call level.

Default list prices

Below are the default list prices (in USD per Million Tokens) configured on the platform.
Model / TierInput Price (per M)Output Price (per M)Cache Read (per M)Cache Write (per M)
Anthropic Claude
claude-sonnet-4-6$3.00$15.00$0.30$3.75
claude-sonnet-4-5-20250929$3.00$15.00$0.30$3.75
claude-haiku-4-5-20251001$1.00$5.00$0.10$1.25
claude-opus-4 / claude-opus-4-7$15.00$75.00$1.50$18.75
OpenAI GPT
gpt-4o-2024-08-06$2.50$10.00$1.25$2.50
gpt-4.1$2.00$8.00$0.50$2.00
Google Gemini
gemini-3.1-pro-preview$1.25$10.00$0.3125$0.00
gemini-3-pro-preview$1.25$10.00$0.3125$0.00
gemini-3.5-flash$1.50$9.00$0.15$0.00

Default fallback rate

If a model name is not recognized or does not match any of the custom tiers above, the system logs a warning and falls back to the default list rate (matching the Claude Sonnet 4.6 tier). The Models breakdown table displays a warning indicator next to the model name to signal that the spend is an estimate.
  • Input: $3.00 per Million
  • Output: $15.00 per Million
  • Cache Read: $0.30 per Million
  • Cache Write: $3.75 per Million

Prompt caching dynamics

To provide highly accurate cost accounting, prompt caching is calculated uniquely per provider:
  • Anthropic / OpenAI: Prompt cache write tokens are priced at a premium rate (cache_write / cache_create), and subsequent hits are charged at a heavily discounted cache_read rate.
  • Google Gemini: Google structures prompt caching differently, billing cache storage per hour rather than a per-token write rate. As a result, cache_write_tokens are priced at $0.00 (cache_create: 0.0), while cache_read_tokens represent the discounted input rate.
  • Double-charging prevention: To prevent double-charging, the platform’s query layers automatically deduct any cache_read_tokens from the billable input_tokens count on each LLM call before applying the pricing rates. Thus, the calculation always evaluates as billable_input = max(input_tokens - cache_read_tokens, 0).