Monetizing an AI agent

This guide walks through setting up token-based billing for an AI agent or LLM-powered product using Solvimon — from meter design to the first invoice.


Why AI billing is different

AI products have usage patterns that don’t fit traditional SaaS billing:

  • Cost varies by model (GPT-4o vs GPT-4o mini vs Claude 3.5 Sonnet)
  • Each request has two billable units (prompt tokens consumed, completion tokens produced)
  • You need per-customer entitlements to enforce plan limits (token budgets, model access tiers)
  • Usage happens in real time — you need to check a customer’s entitlements before executing a request

Solvimon handles all of this natively. Here’s how to set it up.


Step 1: Design your meters

A well-designed meter schema is the foundation of accurate AI billing. Send granular per-request events rather than pre-aggregated totals — this lets you change pricing rules without re-engineering your event ingestion pipeline.

For a multi-model AI product, create two meters: one for prompt tokens and one for completion tokens. Both meters need a model_id property so you can price different models at different rates.

Create the prompt tokens meter

Use POST /v1/meters:

$curl -X POST https://test.api.solvimon.com/v1/meters \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "prompt_tokens",
> "name": "Prompt Tokens"
> }'

Create the completion tokens meter

$curl -X POST https://test.api.solvimon.com/v1/meters \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "completion_tokens",
> "name": "Completion Tokens"
> }'

Create meter values

Each meter needs a NUMBER type meter value to track token counts. Use POST /v1/meter-values:

$curl -X POST https://test.api.solvimon.com/v1/meter-values \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "token_count",
> "name": "Token Count",
> "type": "NUMBER",
> "status": "ACTIVE"
> }'

You can reuse the same meter value reference (token_count) for both meters, or create separate ones. Use the same reference for both if the aggregation logic is identical.

Create meter properties for model segmentation

A model_id property on each meter lets you price GPT-4o differently from GPT-4o mini. Set status: "ACTIVE" — properties must be active to be used in pricing rules. Use POST /v1/meter-properties:

$curl -X POST https://test.api.solvimon.com/v1/meter-properties \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "model_id",
> "name": "Model ID",
> "type": "ENUM",
> "status": "ACTIVE",
> "enum_values": [
> "gpt-4o",
> "gpt-4o-mini",
> "claude-3-5-sonnet",
> "claude-3-5-haiku"
> ]
> }'

Add the model_id property to both meters.

Create meter value calculations

The calculation defines how to aggregate token counts across a billing period. Use SUM. See POST /v1/meter-value-calculations:

$curl -X POST https://test.api.solvimon.com/v1/meter-value-calculations \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "completion_tokens_sum",
> "name": "Completion Tokens Sum",
> "meter_id": "<completion_tokens_meter_id>",
> "meter_value_id": "<token_count_meter_value_id>",
> "calculation_type": "SUM"
> }'

Create a corresponding calculation for prompt tokens.

📘 Via Desk: Usage metering → Meters → New meter. You can configure values, properties, and calculations inline from the Desk editor.


Step 2: Set up your product and pricing plan

Create a product item for completion tokens and link it to the meter value calculation. This is what appears as a line item on invoices.

For per-model pricing, you’ll set pricing rules on the product item so that the rate changes based on the model_id property of the events.

📘 Via Desk: Products & plans → Product catalog → New product. Desk’s pricing plan editor lets you set per-model pricing rules visually. This is recommended for initial setup. See the Configuration API reference for full API details.

Recommended pricing plan structure for a two-tier AI product:

PlanCompletion tokensAvailable modelsMonthly token budget
Starter$0.002/1k tokens (gpt-4o-mini only)gpt-4o-mini1,000,000
Pro0.015/1k(gpt4o),0.015/1k (gpt-4o), 0.0006/1k (gpt-4o-mini)all models10,000,000

Pricing rules on the product item use the model_id property to select the right rate. Set a default rate for any model not explicitly listed.


Step 3: Configure entitlements

Entitlements define what a customer is allowed to do on their plan — model access, token budgets, and feature flags. They’re not billed directly; they’re enforced by your application at request time.

Create these features in Solvimon using POST /v1/features:

Monthly token budget

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "monthly_token_budget",
> "name": "Monthly Token Budget",
> "type": "NUMBER"
> }'

Available models

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "available_models",
> "name": "Available Models",
> "type": "ENUM",
> "enum_values": ["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet", "claude-3-5-haiku"]
> }'

Priority queue

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "priority_queue",
> "name": "Priority Queue",
> "type": "SWITCH"
> }'

Attach these features to your pricing plan versions with the appropriate values per plan tier. The Starter plan gets monthly_token_budget: 1000000 and available_models: ["gpt-4o-mini"]. The Pro plan gets monthly_token_budget: 10000000 and all models.

📘 Via Desk: Products & plans → Features. You can create features and assign entitlement values per pricing plan from Desk.


Step 4: Create a customer and subscription

Follow the same pattern as the Get to your first invoice tutorial. The only difference is that your subscription references your AI pricing plan.

$curl -X POST https://test.api.solvimon.com/v1/pricing-plan-subscriptions/init \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "pricing_plan_subscription": {
> "reference": "acme-ai-pro-2024",
> "customer_reference": "acme-corp",
> "billing_entity_reference": "<your_billing_entity_reference>",
> "billing_currency": "USD",
> "billing_time": "EXACT"
> },
> "pricing_plan_schedules": [
> {
> "pricing_plan_version_selector": {
> "pricing_plan_reference": "ai_pro_plan"
> },
> "start_at": "2024-01-01T00:00:00Z"
> }
> ]
> }'

Step 5: Check entitlements before each request

Before executing an LLM request on behalf of a customer, check their entitlements via GET /v1/customers/{ref}/entitlements to determine which models they can access and whether they have budget remaining.

$curl "https://test.api.solvimon.com/v1/customers/acme-corp/entitlements" \
> -H "X-API-KEY: <apiKey>"

Response (trimmed):

1{
2 "entitlements": [
3 {
4 "feature_reference": "available_models",
5 "enums": ["gpt-4o", "gpt-4o-mini"]
6 },
7 {
8 "feature_reference": "monthly_token_budget",
9 "number": "10000000"
10 },
11 {
12 "feature_reference": "priority_queue",
13 "switch": true
14 }
15 ]
16}

To check current usage against the budget, query GET /v1/ingest/meter-data for this customer:

$curl "https://test.api.solvimon.com/v1/ingest/meter-data?customer_reference=acme-corp&meter_reference=completion_tokens" \
> -H "X-API-KEY: <apiKey>"

Your application compares usage against the monthly_token_budget entitlement and blocks requests that would exceed it. Solvimon provides the values — your application enforces the limit.


Step 6: Report usage after each request

Once the LLM responds, send a usage event via POST /v1/ingest/meter-data with the token counts. For streaming responses, wait until the stream completes before sending the event — send one event per request with the total token counts.

$curl -X POST https://test.api.solvimon.com/v1/ingest/meter-data \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "meter_reference": "completion_tokens",
> "customer_reference": "acme-corp",
> "reference": "req_01J8K3M7P2Q9R4S6T0",
> "timestamp": "2024-01-15T14:30:00Z",
> "meter_properties": [
> {
> "reference": "model_id",
> "value": "gpt-4o"
> }
> ],
> "meter_values": [
> {
> "reference": "token_count",
> "number": "342"
> }
> ]
> }'

Send a separate event for prompt tokens:

$curl -X POST https://test.api.solvimon.com/v1/ingest/meter-data \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "meter_reference": "prompt_tokens",
> "customer_reference": "acme-corp",
> "reference": "req_01J8K3M7P2Q9R4S6T0_prompt",
> "timestamp": "2024-01-15T14:30:00Z",
> "meter_properties": [
> {
> "reference": "model_id",
> "value": "gpt-4o"
> }
> ],
> "meter_values": [
> {
> "reference": "token_count",
> "number": "156"
> }
> ]
> }'

Key fields:

  • reference — use a unique ID per request (your internal request ID works well). Duplicate references are deduplicated automatically.
  • meter_properties[].value — the model used. This is what the pricing rule evaluates to determine the per-token rate.
  • meter_values[].number — the actual token count as reported by the model provider’s API.

Edge cases

Token counting — use the token count returned by the model provider’s API response (usage.prompt_tokens, usage.completion_tokens), not your own tokenizer estimate. Counts vary by model.

Streaming responses — send one event after the stream completes with the total token counts. Do not send incremental events mid-stream.

Request deduplication — if your event ingestion fails and you retry, use the same reference value. Solvimon deduplicates on reference, so the retry won’t double-count.

Model fallbacks — if your application retries a request with a cheaper model after a failure, send separate events for each attempt with the correct model_id for each.


What to set up next