Pricing models for AI products

A reference guide for choosing and configuring the right pricing structure for your AI product. Each model has different trade-offs for you and your customers.

Comparison

Model	Best for	Complexity	Customer predictability
Per-token billing	API products, developer tools	Low	Low
Prepaid credits	Self-serve B2C, developer platforms	Medium	High
Per-seat + token allowance	B2B team products	Medium	High
Model-tiered pricing	Platforms exposing multiple LLMs	Low–Medium	Low
Outcome-based	Vertical agents (coding, document processing)	Medium	High

Per-token billing

Charge a fixed rate per token consumed. Simple to understand and easy to implement.

When to use: Developer API products where customers want to pay exactly for what they use.

Meter setup:

Two meters: prompt_tokens and completion_tokens
Calculation type: SUM for both
Optional: model_id property if you have multiple models

Pricing plan setup:

Usage-based product item linked to the completion tokens meter
Flat rate per unit — e.g., $0.015 per 1,000 tokens (block_size: 1000)
Separate product item for prompt tokens at a lower rate

Example rates (fictional):

Model	Prompt	Completion
gpt-4o	$0.005/1k	$0.015/1k
gpt-4o-mini	$0.00015/1k	$0.0006/1k

Trade-offs:

Customers can’t predict their monthly bill
Encourages efficient prompt engineering
Revenue scales directly with usage

Prepaid credits

Customers purchase a block of tokens or a dollar amount upfront. Usage draws down the balance. This model is common for developer platforms and self-serve products.

When to use: Products where customers want spending predictability and control.

Solvimon setup:

Use staircase or top-up pricing on the product item
Customer buys a block (e.g., 10M tokens for $50); each usage event draws from that block
At period end, unused balance rolls over or expires depending on your configuration

Variant — dollar credits:

Use an AMOUNT type meter value instead of NUMBER
Charge the dollar value of each request (your cost + margin) rather than raw tokens
Useful if your per-token rate varies significantly by model and you want a unified credit currency

Trade-offs:

High customer satisfaction — no surprise bills
Creates commitment (customers pre-pay)
Requires handling balance queries in your application

Per-seat + token allowance

A flat monthly fee per user seat that includes a token budget. Usage above the included allowance is billed at an overage rate.

When to use: B2B team products where buyers prefer predictable pricing.

Solvimon setup:

Per-seat product item (model type: PER_SEAT) for the base charge
Number feature monthly_token_budget set as an entitlement per plan tier
Usage-based product item for tokens, with a pricing rule that activates only above the included threshold — or structure as a separate overage product item that only appears on invoices when the allowance is exceeded

Example plan:

Plan	Seats	Included tokens	Overage
Team	$50/seat/month	2M tokens/seat	$0.01/1k
Business	$80/seat/month	5M tokens/seat	$0.008/1k

Trade-offs:

Familiar model for B2B buyers
Revenue is partially decoupled from usage (seat revenue is guaranteed)
More complex to set up and explain to customers

Model-tiered pricing

Same subscription, but different per-token rates depending on which model the customer uses. Implemented using pricing rules that evaluate the model_id meter property.

When to use: Platforms that expose multiple LLMs and want to reflect the cost difference to customers.

Solvimon setup:

Single completion_tokens meter with a required model_id property
One product item with pricing rules:
- If model_id = gpt-4o → $0.015/1k tokens
- If model_id = gpt-4o-mini → $0.0006/1k tokens
- Default → $0.005/1k tokens (catches any model not explicitly listed)

Trade-offs:

Lets you pass through model cost differences to customers
Customers may optimize their model selection based on price
Adding a new model doesn’t require a new product item — just a new pricing rule

Outcome-based pricing

Charge per completed task rather than per token. The customer pays for a “translation”, a “code review”, a “document summary” — not for the underlying tokens consumed.

When to use: Vertical AI agents where customers think in terms of tasks, not tokens. Particularly effective when you can control and optimize the model usage on the backend.

Solvimon setup:

Single COUNT meter: tasks_completed
Meter value: NUMBER, calculation: SUM
Optional task_type property if you have multiple task types at different prices
Product item with model type USAGE_BASED, flat rate per task

Example:

Task	Price
Document translation	$0.25
Code review	$1.00
Email summarization	$0.05

Implement using a task_type property on the meter and a pricing rule per task type.

Trade-offs:

Most customer-friendly — aligns price with value delivered
Requires you to absorb token cost variability
Higher margin potential if you optimize model selection per task

Mixing models

Real products often combine these. A common pattern:

Per-seat base charge (predictable revenue)
Included token allowance per seat (perceived value)
Per-token overage at model-tiered rates (scales with power users)
Optional premium add-on: priority queue, access to frontier models

Solvimon supports multiple product items per pricing plan, so you can combine all of these on a single subscription.