Pricing models for AI products
A reference guide for choosing and configuring the right pricing structure for your AI product. Each model has different trade-offs for you and your customers.
Comparison
Per-token billing
Charge a fixed rate per token consumed. Simple to understand and easy to implement.
When to use: Developer API products where customers want to pay exactly for what they use.
Meter setup:
- Two meters:
prompt_tokensandcompletion_tokens - Calculation type:
SUMfor both - Optional:
model_idproperty if you have multiple models
Pricing plan setup:
- Usage-based product item linked to the completion tokens meter
- Flat rate per unit — e.g., $0.015 per 1,000 tokens (
block_size: 1000) - Separate product item for prompt tokens at a lower rate
Example rates (fictional):
Trade-offs:
- Customers can’t predict their monthly bill
- Encourages efficient prompt engineering
- Revenue scales directly with usage
Prepaid credits
Customers purchase a block of tokens or a dollar amount upfront. Usage draws down the balance. This model is common for developer platforms and self-serve products.
When to use: Products where customers want spending predictability and control.
Solvimon setup:
- Use staircase or top-up pricing on the product item
- Customer buys a block (e.g., 10M tokens for $50); each usage event draws from that block
- At period end, unused balance rolls over or expires depending on your configuration
Variant — dollar credits:
- Use an
AMOUNTtype meter value instead ofNUMBER - Charge the dollar value of each request (your cost + margin) rather than raw tokens
- Useful if your per-token rate varies significantly by model and you want a unified credit currency
Trade-offs:
- High customer satisfaction — no surprise bills
- Creates commitment (customers pre-pay)
- Requires handling balance queries in your application
Per-seat + token allowance
A flat monthly fee per user seat that includes a token budget. Usage above the included allowance is billed at an overage rate.
When to use: B2B team products where buyers prefer predictable pricing.
Solvimon setup:
- Per-seat product item (model type:
PER_SEAT) for the base charge - Number feature
monthly_token_budgetset as an entitlement per plan tier - Usage-based product item for tokens, with a pricing rule that activates only above the included threshold — or structure as a separate overage product item that only appears on invoices when the allowance is exceeded
Example plan:
Trade-offs:
- Familiar model for B2B buyers
- Revenue is partially decoupled from usage (seat revenue is guaranteed)
- More complex to set up and explain to customers
Model-tiered pricing
Same subscription, but different per-token rates depending on which model the customer uses. Implemented using pricing rules that evaluate the model_id meter property.
When to use: Platforms that expose multiple LLMs and want to reflect the cost difference to customers.
Solvimon setup:
- Single
completion_tokensmeter with a requiredmodel_idproperty - One product item with pricing rules:
- If
model_id = gpt-4o→ $0.015/1k tokens - If
model_id = gpt-4o-mini→ $0.0006/1k tokens - Default → $0.005/1k tokens (catches any model not explicitly listed)
- If
Trade-offs:
- Lets you pass through model cost differences to customers
- Customers may optimize their model selection based on price
- Adding a new model doesn’t require a new product item — just a new pricing rule
Outcome-based pricing
Charge per completed task rather than per token. The customer pays for a “translation”, a “code review”, a “document summary” — not for the underlying tokens consumed.
When to use: Vertical AI agents where customers think in terms of tasks, not tokens. Particularly effective when you can control and optimize the model usage on the backend.
Solvimon setup:
- Single COUNT meter:
tasks_completed - Meter value:
NUMBER, calculation:SUM - Optional
task_typeproperty if you have multiple task types at different prices - Product item with model type
USAGE_BASED, flat rate per task
Example:
Implement using a task_type property on the meter and a pricing rule per task type.
Trade-offs:
- Most customer-friendly — aligns price with value delivered
- Requires you to absorb token cost variability
- Higher margin potential if you optimize model selection per task
Mixing models
Real products often combine these. A common pattern:
- Per-seat base charge (predictable revenue)
- Included token allowance per seat (perceived value)
- Per-token overage at model-tiered rates (scales with power users)
- Optional premium add-on: priority queue, access to frontier models
Solvimon supports multiple product items per pricing plan, so you can combine all of these on a single subscription.