Monetizing an AI agent
This guide walks through setting up token-based billing for an AI agent or LLM-powered product using Solvimon — from meter design to the first invoice.
Why AI billing is different
AI products have usage patterns that don’t fit traditional SaaS billing:
- Cost varies by model (GPT-4o vs GPT-4o mini vs Claude 3.5 Sonnet)
- Each request has two billable units (prompt tokens consumed, completion tokens produced)
- You need per-customer entitlements to enforce plan limits (token budgets, model access tiers)
- Usage happens in real time — you need to check a customer’s entitlements before executing a request
Solvimon handles all of this natively. Here’s how to set it up.
Step 1: Design your meters
A well-designed meter schema is the foundation of accurate AI billing. Send granular per-request events rather than pre-aggregated totals — this lets you change pricing rules without re-engineering your event ingestion pipeline.
For a multi-model AI product, create two meters: one for prompt tokens and one for completion tokens. Both meters need a model_id property so you can price different models at different rates.
Create the prompt tokens meter
Use POST /v1/meters:
Create the completion tokens meter
Create meter values
Each meter needs a NUMBER type meter value to track token counts. Use POST /v1/meter-values:
You can reuse the same meter value reference (token_count) for both meters, or create separate ones. Use the same reference for both if the aggregation logic is identical.
Create meter properties for model segmentation
A model_id property on each meter lets you price GPT-4o differently from GPT-4o mini. Set status: "ACTIVE" — properties must be active to be used in pricing rules. Use POST /v1/meter-properties:
Add the model_id property to both meters.
Create meter value calculations
The calculation defines how to aggregate token counts across a billing period. Use SUM. See POST /v1/meter-value-calculations:
Create a corresponding calculation for prompt tokens.
📘 Via Desk: Usage metering → Meters → New meter. You can configure values, properties, and calculations inline from the Desk editor.
Step 2: Set up your product and pricing plan
Create a product item for completion tokens and link it to the meter value calculation. This is what appears as a line item on invoices.
For per-model pricing, you’ll set pricing rules on the product item so that the rate changes based on the model_id property of the events.
📘 Via Desk: Products & plans → Product catalog → New product. Desk’s pricing plan editor lets you set per-model pricing rules visually. This is recommended for initial setup. See the Configuration API reference for full API details.
Recommended pricing plan structure for a two-tier AI product:
Pricing rules on the product item use the model_id property to select the right rate. Set a default rate for any model not explicitly listed.
Step 3: Configure entitlements
Entitlements define what a customer is allowed to do on their plan — model access, token budgets, and feature flags. They’re not billed directly; they’re enforced by your application at request time.
Create these features in Solvimon using POST /v1/features:
Monthly token budget
Available models
Priority queue
Attach these features to your pricing plan versions with the appropriate values per plan tier. The Starter plan gets monthly_token_budget: 1000000 and available_models: ["gpt-4o-mini"]. The Pro plan gets monthly_token_budget: 10000000 and all models.
📘 Via Desk: Products & plans → Features. You can create features and assign entitlement values per pricing plan from Desk.
Step 4: Create a customer and subscription
Follow the same pattern as the Get to your first invoice tutorial. The only difference is that your subscription references your AI pricing plan.
Step 5: Check entitlements before each request
Before executing an LLM request on behalf of a customer, check their entitlements via GET /v1/customers/{ref}/entitlements to determine which models they can access and whether they have budget remaining.
Response (trimmed):
To check current usage against the budget, query GET /v1/ingest/meter-data for this customer:
Your application compares usage against the monthly_token_budget entitlement and blocks requests that would exceed it. Solvimon provides the values — your application enforces the limit.
Step 6: Report usage after each request
Once the LLM responds, send a usage event via POST /v1/ingest/meter-data with the token counts. For streaming responses, wait until the stream completes before sending the event — send one event per request with the total token counts.
Send a separate event for prompt tokens:
Key fields:
reference— use a unique ID per request (your internal request ID works well). Duplicate references are deduplicated automatically.meter_properties[].value— the model used. This is what the pricing rule evaluates to determine the per-token rate.meter_values[].number— the actual token count as reported by the model provider’s API.
Edge cases
Token counting — use the token count returned by the model provider’s API response (usage.prompt_tokens, usage.completion_tokens), not your own tokenizer estimate. Counts vary by model.
Streaming responses — send one event after the stream completes with the total token counts. Do not send incremental events mid-stream.
Request deduplication — if your event ingestion fails and you retry, use the same reference value. Solvimon deduplicates on reference, so the retry won’t double-count.
Model fallbacks — if your application retries a request with a cheaper model after a failure, send separate events for each attempt with the correct model_id for each.
What to set up next
- Pricing models for AI products — compare per-token, prepaid credits, per-seat, and outcome-based pricing
- Configuring entitlements for AI products — detailed guide to rate limits, model access tiers, and free tier gating
- Webhooks — receive
invoice.finalizedevents to trigger billing notifications