Pricing models for AI products

A reference guide for choosing and configuring the right pricing structure for your AI product. Each model has different trade-offs for you and your customers.


Comparison

ModelBest forComplexityCustomer predictability
Per-token billingAPI products, developer toolsLowLow
Prepaid creditsSelf-serve B2C, developer platformsMediumHigh
Per-seat + token allowanceB2B team productsMediumHigh
Model-tiered pricingPlatforms exposing multiple LLMsLow–MediumLow
Outcome-basedVertical agents (coding, document processing)MediumHigh

Per-token billing

Charge a fixed rate per token consumed. Simple to understand and easy to implement.

When to use: Developer API products where customers want to pay exactly for what they use.

Meter setup:

  • Two meters: prompt_tokens and completion_tokens
  • Calculation type: SUM for both
  • Optional: model_id property if you have multiple models

Pricing plan setup:

  • Usage-based product item linked to the completion tokens meter
  • Flat rate per unit — e.g., $0.015 per 1,000 tokens (block_size: 1000)
  • Separate product item for prompt tokens at a lower rate

Example rates (fictional):

ModelPromptCompletion
gpt-4o$0.005/1k$0.015/1k
gpt-4o-mini$0.00015/1k$0.0006/1k

Trade-offs:

  • Customers can’t predict their monthly bill
  • Encourages efficient prompt engineering
  • Revenue scales directly with usage

Prepaid credits

Customers purchase a block of tokens or a dollar amount upfront. Usage draws down the balance. This model is common for developer platforms and self-serve products.

When to use: Products where customers want spending predictability and control.

Solvimon setup:

  • Use staircase or top-up pricing on the product item
  • Customer buys a block (e.g., 10M tokens for $50); each usage event draws from that block
  • At period end, unused balance rolls over or expires depending on your configuration

Variant — dollar credits:

  • Use an AMOUNT type meter value instead of NUMBER
  • Charge the dollar value of each request (your cost + margin) rather than raw tokens
  • Useful if your per-token rate varies significantly by model and you want a unified credit currency

Trade-offs:

  • High customer satisfaction — no surprise bills
  • Creates commitment (customers pre-pay)
  • Requires handling balance queries in your application

Per-seat + token allowance

A flat monthly fee per user seat that includes a token budget. Usage above the included allowance is billed at an overage rate.

When to use: B2B team products where buyers prefer predictable pricing.

Solvimon setup:

  • Per-seat product item (model type: PER_SEAT) for the base charge
  • Number feature monthly_token_budget set as an entitlement per plan tier
  • Usage-based product item for tokens, with a pricing rule that activates only above the included threshold — or structure as a separate overage product item that only appears on invoices when the allowance is exceeded

Example plan:

PlanSeatsIncluded tokensOverage
Team$50/seat/month2M tokens/seat$0.01/1k
Business$80/seat/month5M tokens/seat$0.008/1k

Trade-offs:

  • Familiar model for B2B buyers
  • Revenue is partially decoupled from usage (seat revenue is guaranteed)
  • More complex to set up and explain to customers

Model-tiered pricing

Same subscription, but different per-token rates depending on which model the customer uses. Implemented using pricing rules that evaluate the model_id meter property.

When to use: Platforms that expose multiple LLMs and want to reflect the cost difference to customers.

Solvimon setup:

  • Single completion_tokens meter with a required model_id property
  • One product item with pricing rules:
    • If model_id = gpt-4o → $0.015/1k tokens
    • If model_id = gpt-4o-mini → $0.0006/1k tokens
    • Default → $0.005/1k tokens (catches any model not explicitly listed)

Trade-offs:

  • Lets you pass through model cost differences to customers
  • Customers may optimize their model selection based on price
  • Adding a new model doesn’t require a new product item — just a new pricing rule

Outcome-based pricing

Charge per completed task rather than per token. The customer pays for a “translation”, a “code review”, a “document summary” — not for the underlying tokens consumed.

When to use: Vertical AI agents where customers think in terms of tasks, not tokens. Particularly effective when you can control and optimize the model usage on the backend.

Solvimon setup:

  • Single COUNT meter: tasks_completed
  • Meter value: NUMBER, calculation: SUM
  • Optional task_type property if you have multiple task types at different prices
  • Product item with model type USAGE_BASED, flat rate per task

Example:

TaskPrice
Document translation$0.25
Code review$1.00
Email summarization$0.05

Implement using a task_type property on the meter and a pricing rule per task type.

Trade-offs:

  • Most customer-friendly — aligns price with value delivered
  • Requires you to absorb token cost variability
  • Higher margin potential if you optimize model selection per task

Mixing models

Real products often combine these. A common pattern:

  • Per-seat base charge (predictable revenue)
  • Included token allowance per seat (perceived value)
  • Per-token overage at model-tiered rates (scales with power users)
  • Optional premium add-on: priority queue, access to frontier models

Solvimon supports multiple product items per pricing plan, so you can combine all of these on a single subscription.