For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Logo
LoginSandbox
Platform GuidesAPI DocsSDKMCP ServerChangelog
Platform GuidesAPI DocsSDKMCP ServerChangelog
  • Getting started
    • Introduction
    • Concepts
    • Onboarding guide
  • Use cases
    • Monetizing an AI agent
    • Pricing models for AI products
    • Configuring entitlements for AI products
  • Usage metering & events
    • Meters
    • Usage events
    • Unmatched events
    • Event reprocessing
  • Products & plans
    • Product Catalog
    • Pricing Plans
    • Features & Entitlements
  • Wallets & credits
    • Credit types & wallets
    • Credits in pricing plans
  • Customers & subscriptions
    • Customers
    • Subscriptions
    • Quotes
  • Invoicing
    • Invoices
    • Invoice corrections
    • Payment Collections
    • Tax
    • Currencies & Exchange rates
    • E-invoicing
  • Analytics & reporting
    • Insights
    • Report downloads
    • Revenue Recognition & Deferred Revenue
  • Integrations
    • CRM
    • ERP
    • Email Providers
    • Data Imports: Event File Uploads
    • Data Exports
    • Payment Service Providers
  • Settings & admin
    • Billing entities
    • Payment Options
    • User management
    • Customising invoices
    • Custom fields
    • Default platform settings
    • API key creation
    • Alert Rules
    • Workflows
  • Workflows
    • Workflow Triggers
    • Workflow Actions
    • Monitoring and Audit
    • Common Use Cases
  • For developers
    • Introduction
    • Authentication
    • Local development setup
    • API
    • Errors
    • Troubleshooting
    • Query Parameters
    • Expanding Responses
    • Idempotency
    • Timestamp formatting
    • Webhooks
    • Locale
    • MCP Server
LoginSandbox
On this page
  • Why AI billing is different
  • Step 1: Design your meters
  • Create the prompt tokens meter
  • Create the completion tokens meter
  • Create meter values
  • Create meter properties for model segmentation
  • Create meter value calculations
  • Step 2: Set up your product and pricing plan
  • Step 3: Configure entitlements
  • Monthly token budget
  • Available models
  • Priority queue
  • Step 4: Create a customer and subscription
  • Step 5: Check entitlements before each request
  • Step 6: Report usage after each request
  • Edge cases
  • What to set up next
Use cases

Monetizing an AI agent

Was this page helpful?
Previous

Pricing models for AI products

Next
Built with

This guide walks through setting up token-based billing for an AI agent or LLM-powered product using Solvimon — from meter design to the first invoice.


Why AI billing is different

AI products have usage patterns that don’t fit traditional SaaS billing:

  • Cost varies by model (GPT-4o vs GPT-4o mini vs Claude 3.5 Sonnet)
  • Each request has two billable units (prompt tokens consumed, completion tokens produced)
  • You need per-customer entitlements to enforce plan limits (token budgets, model access tiers)
  • Usage happens in real time — you need to check a customer’s entitlements before executing a request

Solvimon handles all of this natively. Here’s how to set it up.


Step 1: Design your meters

A well-designed meter schema is the foundation of accurate AI billing. Send granular per-request events rather than pre-aggregated totals — this lets you change pricing rules without re-engineering your event ingestion pipeline.

For a multi-model AI product, create two meters: one for prompt tokens and one for completion tokens. Both meters need a model_id property so you can price different models at different rates.

Create the prompt tokens meter

Use POST /v1/meters:

$curl -X POST https://test.api.solvimon.com/v1/meters \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "prompt_tokens",
> "name": "Prompt Tokens"
> }'

Create the completion tokens meter

$curl -X POST https://test.api.solvimon.com/v1/meters \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "completion_tokens",
> "name": "Completion Tokens"
> }'

Create meter values

Each meter needs a NUMBER type meter value to track token counts. Use POST /v1/meter-values:

$curl -X POST https://test.api.solvimon.com/v1/meter-values \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "token_count",
> "name": "Token Count",
> "type": "NUMBER",
> "status": "ACTIVE"
> }'

You can reuse the same meter value reference (token_count) for both meters, or create separate ones. Use the same reference for both if the aggregation logic is identical.

Create meter properties for model segmentation

A model_id property on each meter lets you price GPT-4o differently from GPT-4o mini. Set status: "ACTIVE" — properties must be active to be used in pricing rules. Use POST /v1/meter-properties:

$curl -X POST https://test.api.solvimon.com/v1/meter-properties \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "model_id",
> "name": "Model ID",
> "type": "ENUM",
> "status": "ACTIVE",
> "enum_values": [
> "gpt-4o",
> "gpt-4o-mini",
> "claude-3-5-sonnet",
> "claude-3-5-haiku"
> ]
> }'

Add the model_id property to both meters.

Create meter value calculations

The calculation defines how to aggregate token counts across a billing period. Use SUM. See POST /v1/meter-value-calculations:

$curl -X POST https://test.api.solvimon.com/v1/meter-value-calculations \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "completion_tokens_sum",
> "name": "Completion Tokens Sum",
> "meter_id": "<completion_tokens_meter_id>",
> "meter_value_id": "<token_count_meter_value_id>",
> "calculation_type": "SUM"
> }'

Create a corresponding calculation for prompt tokens.


Step 2: Set up your product and pricing plan

Create a product item for completion tokens and link it to the meter value calculation. This is what appears as a line item on invoices.

For per-model pricing, you’ll set pricing rules on the product item so that the rate changes based on the model_id property of the events.

Recommended pricing plan structure for a two-tier AI product:

PlanCompletion tokensAvailable modelsMonthly token budget
Starter$0.002/1k tokens (gpt-4o-mini only)gpt-4o-mini1,000,000
Pro$0.015/1k (gpt-4o), $0.0006/1k (gpt-4o-mini)all models10,000,000

Pricing rules on the product item use the model_id property to select the right rate. Set a default rate for any model not explicitly listed.


Step 3: Configure entitlements

Entitlements define what a customer is allowed to do on their plan — model access, token budgets, and feature flags. They’re not billed directly; they’re enforced by your application at request time.

Create these features in Solvimon using POST /v1/features:

Monthly token budget

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "monthly_token_budget",
> "name": "Monthly Token Budget",
> "type": "NUMBER"
> }'

Available models

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "available_models",
> "name": "Available Models",
> "type": "ENUM",
> "enum_values": ["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet", "claude-3-5-haiku"]
> }'

Priority queue

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "priority_queue",
> "name": "Priority Queue",
> "type": "SWITCH"
> }'

Attach these features to your pricing plan versions with the appropriate values per plan tier. The Starter plan gets monthly_token_budget: 1000000 and available_models: ["gpt-4o-mini"]. The Pro plan gets monthly_token_budget: 10000000 and all models.


Step 4: Create a customer and subscription

Follow the same pattern as the Get to your first invoice tutorial. The only difference is that your subscription references your AI pricing plan.

$curl -X POST https://test.api.solvimon.com/v1/pricing-plan-subscriptions/init \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "pricing_plan_subscription": {
> "reference": "acme-ai-pro-2024",
> "customer_reference": "acme-corp",
> "billing_entity_reference": "<your_billing_entity_reference>",
> "billing_currency": "USD",
> "billing_time": "EXACT"
> },
> "pricing_plan_schedules": [
> {
> "pricing_plan_version_selector": {
> "pricing_plan_reference": "ai_pro_plan"
> },
> "start_at": "2024-01-01T00:00:00Z"
> }
> ]
> }'

Step 5: Check entitlements before each request

Before executing an LLM request on behalf of a customer, check their entitlements via GET /v1/customers/{ref}/entitlements to determine which models they can access and whether they have budget remaining.

$curl "https://test.api.solvimon.com/v1/customers/acme-corp/entitlements" \
> -H "X-API-KEY: <apiKey>"

Response (trimmed):

1{
2 "entitlements": [
3 {
4 "feature_reference": "available_models",
5 "enums": ["gpt-4o", "gpt-4o-mini"]
6 },
7 {
8 "feature_reference": "monthly_token_budget",
9 "number": "10000000"
10 },
11 {
12 "feature_reference": "priority_queue",
13 "switch": true
14 }
15 ]
16}

To check current usage against the budget, query GET /v1/ingest/meter-data for this customer:

$curl "https://test.api.solvimon.com/v1/ingest/meter-data?customer_reference=acme-corp&meter_reference=completion_tokens" \
> -H "X-API-KEY: <apiKey>"

Your application compares usage against the monthly_token_budget entitlement and blocks requests that would exceed it. Solvimon provides the values — your application enforces the limit.


Step 6: Report usage after each request

Once the LLM responds, send a usage event via POST /v1/ingest/meter-data with the token counts. For streaming responses, wait until the stream completes before sending the event — send one event per request with the total token counts.

$curl -X POST https://test.api.solvimon.com/v1/ingest/meter-data \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "meter_reference": "completion_tokens",
> "customer_reference": "acme-corp",
> "reference": "req_01J8K3M7P2Q9R4S6T0",
> "timestamp": "2024-01-15T14:30:00Z",
> "meter_properties": [
> {
> "reference": "model_id",
> "value": "gpt-4o"
> }
> ],
> "meter_values": [
> {
> "reference": "token_count",
> "number": "342"
> }
> ]
> }'

Send a separate event for prompt tokens:

$curl -X POST https://test.api.solvimon.com/v1/ingest/meter-data \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "meter_reference": "prompt_tokens",
> "customer_reference": "acme-corp",
> "reference": "req_01J8K3M7P2Q9R4S6T0_prompt",
> "timestamp": "2024-01-15T14:30:00Z",
> "meter_properties": [
> {
> "reference": "model_id",
> "value": "gpt-4o"
> }
> ],
> "meter_values": [
> {
> "reference": "token_count",
> "number": "156"
> }
> ]
> }'

Key fields:

  • reference — use a unique ID per request (your internal request ID works well). Duplicate references are deduplicated automatically.
  • meter_properties[].value — the model used. This is what the pricing rule evaluates to determine the per-token rate.
  • meter_values[].number — the actual token count as reported by the model provider’s API.

Edge cases

Token counting — use the token count returned by the model provider’s API response (usage.prompt_tokens, usage.completion_tokens), not your own tokenizer estimate. Counts vary by model.

Streaming responses — send one event after the stream completes with the total token counts. Do not send incremental events mid-stream.

Request deduplication — if your event ingestion fails and you retry, use the same reference value. Solvimon deduplicates on reference, so the retry won’t double-count.

Model fallbacks — if your application retries a request with a cheaper model after a failure, send separate events for each attempt with the correct model_id for each.


What to set up next

  • Pricing models for AI products — compare per-token, prepaid credits, per-seat, and outcome-based pricing
  • Configuring entitlements for AI products — detailed guide to rate limits, model access tiers, and free tier gating
  • Webhooks — receive invoice.finalized events to trigger billing notifications