This guide walks through setting up token-based billing for an AI agent or LLM-powered product using Solvimon — from meter design to the first invoice.
AI products have usage patterns that don’t fit traditional SaaS billing:
Solvimon handles all of this natively. Here’s how to set it up.
A well-designed meter schema is the foundation of accurate AI billing. Send granular per-request events rather than pre-aggregated totals — this lets you change pricing rules without re-engineering your event ingestion pipeline.
For a multi-model AI product, create two meters: one for prompt tokens and one for completion tokens. Both meters need a model_id property so you can price different models at different rates.
Use POST /v1/meters:
Each meter needs a NUMBER type meter value to track token counts. Use POST /v1/meter-values:
You can reuse the same meter value reference (token_count) for both meters, or create separate ones. Use the same reference for both if the aggregation logic is identical.
A model_id property on each meter lets you price GPT-4o differently from GPT-4o mini. Set status: "ACTIVE" — properties must be active to be used in pricing rules. Use POST /v1/meter-properties:
Add the model_id property to both meters.
The calculation defines how to aggregate token counts across a billing period. Use SUM. See POST /v1/meter-value-calculations:
Create a corresponding calculation for prompt tokens.
Create a product item for completion tokens and link it to the meter value calculation. This is what appears as a line item on invoices.
For per-model pricing, you’ll set pricing rules on the product item so that the rate changes based on the model_id property of the events.
Recommended pricing plan structure for a two-tier AI product:
Pricing rules on the product item use the model_id property to select the right rate. Set a default rate for any model not explicitly listed.
Entitlements define what a customer is allowed to do on their plan — model access, token budgets, and feature flags. They’re not billed directly; they’re enforced by your application at request time.
Create these features in Solvimon using POST /v1/features:
Attach these features to your pricing plan versions with the appropriate values per plan tier. The Starter plan gets monthly_token_budget: 1000000 and available_models: ["gpt-4o-mini"]. The Pro plan gets monthly_token_budget: 10000000 and all models.
Follow the same pattern as the Get to your first invoice tutorial. The only difference is that your subscription references your AI pricing plan.
Before executing an LLM request on behalf of a customer, check their entitlements via GET /v1/customers/{ref}/entitlements to determine which models they can access and whether they have budget remaining.
Response (trimmed):
To check current usage against the budget, query GET /v1/ingest/meter-data for this customer:
Your application compares usage against the monthly_token_budget entitlement and blocks requests that would exceed it. Solvimon provides the values — your application enforces the limit.
Once the LLM responds, send a usage event via POST /v1/ingest/meter-data with the token counts. For streaming responses, wait until the stream completes before sending the event — send one event per request with the total token counts.
Send a separate event for prompt tokens:
Key fields:
reference — use a unique ID per request (your internal request ID works well). Duplicate references are deduplicated automatically.meter_properties[].value — the model used. This is what the pricing rule evaluates to determine the per-token rate.meter_values[].number — the actual token count as reported by the model provider’s API.Token counting — use the token count returned by the model provider’s API response (usage.prompt_tokens, usage.completion_tokens), not your own tokenizer estimate. Counts vary by model.
Streaming responses — send one event after the stream completes with the total token counts. Do not send incremental events mid-stream.
Request deduplication — if your event ingestion fails and you retry, use the same reference value. Solvimon deduplicates on reference, so the retry won’t double-count.
Model fallbacks — if your application retries a request with a cheaper model after a failure, send separate events for each attempt with the correct model_id for each.
invoice.finalized events to trigger billing notifications