Monetizing an AI agent

This guide walks through setting up token-based billing for an AI agent or LLM-powered product using Solvimon — from meter design to the first invoice.

Why AI billing is different

AI products have usage patterns that don’t fit traditional SaaS billing:

Cost varies by model (GPT-4o vs GPT-4o mini vs Claude 3.5 Sonnet)
Each request has two billable units (prompt tokens consumed, completion tokens produced)
You need per-customer entitlements to enforce plan limits (token budgets, model access tiers)
Usage happens in real time — you need to check a customer’s entitlements before executing a request

Solvimon handles all of this natively. Here’s how to set it up.

Step 1: Design your meters

A well-designed meter schema is the foundation of accurate AI billing. Send granular per-request events rather than pre-aggregated totals — this lets you change pricing rules without re-engineering your event ingestion pipeline.

For a multi-model AI product, create two meters: one for prompt tokens and one for completion tokens. Both meters need a model_id property so you can price different models at different rates.

Create the prompt tokens meter

Use POST /v1/meters:

$ curl -X POST https://test.api.solvimon.com/v1/meters \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "reference": "prompt_tokens",
>     "name": "Prompt Tokens"
>   }'

Create the completion tokens meter

$ curl -X POST https://test.api.solvimon.com/v1/meters \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "reference": "completion_tokens",
>     "name": "Completion Tokens"
>   }'

Create meter values

Each meter needs a NUMBER type meter value to track token counts. Use POST /v1/meter-values:

$ curl -X POST https://test.api.solvimon.com/v1/meter-values \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "reference": "token_count",
>     "name": "Token Count",
>     "type": "NUMBER",
>     "status": "ACTIVE"
>   }'

You can reuse the same meter value reference (token_count) for both meters, or create separate ones. Use the same reference for both if the aggregation logic is identical.

Create meter properties for model segmentation

A model_id property on each meter lets you price GPT-4o differently from GPT-4o mini. Set status: "ACTIVE" — properties must be active to be used in pricing rules. Use POST /v1/meter-properties:

$ curl -X POST https://test.api.solvimon.com/v1/meter-properties \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "reference": "model_id",
>     "name": "Model ID",
>     "type": "ENUM",
>     "status": "ACTIVE",
>     "enum_values": [
>       "gpt-4o",
>       "gpt-4o-mini",
>       "claude-3-5-sonnet",
>       "claude-3-5-haiku"
>     ]
>   }'

Add the model_id property to both meters.

Create meter value calculations

The calculation defines how to aggregate token counts across a billing period. Use SUM. See POST /v1/meter-value-calculations:

$ curl -X POST https://test.api.solvimon.com/v1/meter-value-calculations \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "reference": "completion_tokens_sum",
>     "name": "Completion Tokens Sum",
>     "meter_id": "<completion_tokens_meter_id>",
>     "meter_value_id": "<token_count_meter_value_id>",
>     "calculation_type": "SUM"
>   }'

Create a corresponding calculation for prompt tokens.

Step 2: Set up your product and pricing plan

Create a product item for completion tokens and link it to the meter value calculation. This is what appears as a line item on invoices.

For per-model pricing, you’ll set pricing rules on the product item so that the rate changes based on the model_id property of the events.

Recommended pricing plan structure for a two-tier AI product:

Plan	Completion tokens	Available models	Monthly token budget
Starter	$0.002/1k tokens (gpt-4o-mini only)	gpt-4o-mini	1,000,000
Pro	$0.015/1k (gpt-4o), $0.0006/1k (gpt-4o-mini)	all models	10,000,000

Pricing rules on the product item use the model_id property to select the right rate. Set a default rate for any model not explicitly listed.

Step 3: Configure entitlements

Entitlements define what a customer is allowed to do on their plan — model access, token budgets, and feature flags. They’re not billed directly; they’re enforced by your application at request time.

Create these features in Solvimon using POST /v1/features:

Monthly token budget

$ curl -X POST https://test.api.solvimon.com/v1/features \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "reference": "monthly_token_budget",
>     "name": "Monthly Token Budget",
>     "type": "NUMBER"
>   }'

Available models

$ curl -X POST https://test.api.solvimon.com/v1/features \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "reference": "available_models",
>     "name": "Available Models",
>     "type": "ENUM",
>     "enum_values": ["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet", "claude-3-5-haiku"]
>   }'

Priority queue

$ curl -X POST https://test.api.solvimon.com/v1/features \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "reference": "priority_queue",
>     "name": "Priority Queue",
>     "type": "SWITCH"
>   }'

Attach these features to your pricing plan versions with the appropriate values per plan tier. The Starter plan gets monthly_token_budget: 1000000 and available_models: ["gpt-4o-mini"]. The Pro plan gets monthly_token_budget: 10000000 and all models.

Step 4: Create a customer and subscription

Follow the same pattern as the Get to your first invoice tutorial. The only difference is that your subscription references your AI pricing plan.

$ curl -X POST https://test.api.solvimon.com/v1/pricing-plan-subscriptions/init \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "pricing_plan_subscription": {
>       "reference": "acme-ai-pro-2024",
>       "customer_reference": "acme-corp",
>       "billing_entity_reference": "<your_billing_entity_reference>",
>       "billing_currency": "USD",
>       "billing_time": "EXACT"
>     },
>     "pricing_plan_schedules": [
>       {
>         "pricing_plan_version_selector": {
>           "pricing_plan_reference": "ai_pro_plan"
>         },
>         "start_at": "2024-01-01T00:00:00Z"
>       }
>     ]
>   }'

Step 5: Check entitlements before each request

Before executing an LLM request on behalf of a customer, check their entitlements via GET /v1/customers/{ref}/entitlements to determine which models they can access and whether they have budget remaining.

$ curl "https://test.api.solvimon.com/v1/customers/acme-corp/entitlements" \
>   -H "X-API-KEY: <apiKey>"

Response (trimmed):

1 {
2   "entitlements": [
3     {
4       "feature_reference": "available_models",
5       "enums": ["gpt-4o", "gpt-4o-mini"]
6     },
7     {
8       "feature_reference": "monthly_token_budget",
9       "number": "10000000"
10     },
11     {
12       "feature_reference": "priority_queue",
13       "switch": true
14     }
15   ]
16 }

To check current usage against the budget, query GET /v1/ingest/meter-data for this customer:

$ curl "https://test.api.solvimon.com/v1/ingest/meter-data?customer_reference=acme-corp&meter_reference=completion_tokens" \
>   -H "X-API-KEY: <apiKey>"

Your application compares usage against the monthly_token_budget entitlement and blocks requests that would exceed it. Solvimon provides the values — your application enforces the limit.

Step 6: Report usage after each request

Once the LLM responds, send a usage event via POST /v1/ingest/meter-data with the token counts. For streaming responses, wait until the stream completes before sending the event — send one event per request with the total token counts.

$ curl -X POST https://test.api.solvimon.com/v1/ingest/meter-data \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "meter_reference": "completion_tokens",
>     "customer_reference": "acme-corp",
>     "reference": "req_01J8K3M7P2Q9R4S6T0",
>     "timestamp": "2024-01-15T14:30:00Z",
>     "meter_properties": [
>       {
>         "reference": "model_id",
>         "value": "gpt-4o"
>       }
>     ],
>     "meter_values": [
>       {
>         "reference": "token_count",
>         "number": "342"
>       }
>     ]
>   }'

Send a separate event for prompt tokens:

$ curl -X POST https://test.api.solvimon.com/v1/ingest/meter-data \
>   -H "X-API-KEY: <apiKey>" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "meter_reference": "prompt_tokens",
>     "customer_reference": "acme-corp",
>     "reference": "req_01J8K3M7P2Q9R4S6T0_prompt",
>     "timestamp": "2024-01-15T14:30:00Z",
>     "meter_properties": [
>       {
>         "reference": "model_id",
>         "value": "gpt-4o"
>       }
>     ],
>     "meter_values": [
>       {
>         "reference": "token_count",
>         "number": "156"
>       }
>     ]
>   }'

Key fields:

reference — use a unique ID per request (your internal request ID works well). Duplicate references are deduplicated automatically.
meter_properties[].value — the model used. This is what the pricing rule evaluates to determine the per-token rate.
meter_values[].number — the actual token count as reported by the model provider’s API.

Edge cases

Token counting — use the token count returned by the model provider’s API response (usage.prompt_tokens, usage.completion_tokens), not your own tokenizer estimate. Counts vary by model.

Streaming responses — send one event after the stream completes with the total token counts. Do not send incremental events mid-stream.

Request deduplication — if your event ingestion fails and you retry, use the same reference value. Solvimon deduplicates on reference, so the retry won’t double-count.

Model fallbacks — if your application retries a request with a cheaper model after a failure, send separate events for each attempt with the correct model_id for each.

What to set up next

Pricing models for AI products — compare per-token, prepaid credits, per-seat, and outcome-based pricing
Configuring entitlements for AI products — detailed guide to rate limits, model access tiers, and free tier gating
Webhooks — receive invoice.finalized events to trigger billing notifications

$	curl -X POST https://test.api.solvimon.com/v1/meters \
>	-H "X-API-KEY: <apiKey>" \
>	-H "Content-Type: application/json" \
>	-d '{
>	"reference": "prompt_tokens",
>	"name": "Prompt Tokens"
>	}'

$	curl "https://test.api.solvimon.com/v1/customers/acme-corp/entitlements" \
>	-H "X-API-KEY: <apiKey>"

1	{
2	"entitlements": [
3	{
4	"feature_reference": "available_models",
5	"enums": ["gpt-4o", "gpt-4o-mini"]
6	},
7	{
8	"feature_reference": "monthly_token_budget",
9	"number": "10000000"
10	},
11	{
12	"feature_reference": "priority_queue",
13	"switch": true
14	}
15	]
16	}

$	curl "https://test.api.solvimon.com/v1/ingest/meter-data?customer_reference=acme-corp&meter_reference=completion_tokens" \
>	-H "X-API-KEY: <apiKey>"