Configuring entitlements for AI products

Entitlements define what a customer is allowed to do on their plan — which models they can access, how many tokens they can consume per month, and whether premium features are enabled. This guide covers how to configure and enforce entitlements for AI products.


How entitlements work

An entitlement is a feature value assigned to a customer via their pricing plan or subscription. Features are defined once and then assigned per plan tier with different values.

Solvimon stores the entitlement values and makes them queryable. Your application is responsible for reading those values and enforcing them — Solvimon doesn’t block requests at runtime.

The pattern is:

  1. Define features in Solvimon (once)
  2. Assign feature values to pricing plan versions (per tier)
  3. Before executing a customer request, query their entitlements and meter usage
  4. Allow or block the request based on what you find

Rate limits

Rate limits (requests per minute, requests per day) are a common entitlement for AI APIs.

Define the feature

Use POST /v1/features:

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "requests_per_minute",
> "name": "Requests Per Minute",
> "type": "NUMBER"
> }'

Assign different values per plan

When configuring a pricing plan version in Desk or via the API, set the requests_per_minute entitlement to:

  • Starter: 60
  • Pro: 600
  • Enterprise: 6000

Enforce at runtime

Query the customer’s entitlements via GET /v1/customers/{ref}/entitlements before accepting a request:

$curl "https://test.api.solvimon.com/v1/customers/acme-corp/entitlements" \
> -H "X-API-KEY: <apiKey>"
1{
2 "entitlements": [
3 {
4 "feature_reference": "requests_per_minute",
5 "number": "600"
6 }
7 ]
8}

Cache this value at session start (e.g., when a user authenticates). Enforce it in your rate-limiting middleware using a token bucket or sliding window counter in Redis or similar.

📘 Solvimon provides the entitlement value. Rate limit enforcement — counting requests, maintaining state, returning 429 — happens in your application.


Monthly token budgets

A monthly token budget limits how many tokens a customer can consume per billing period before being blocked or charged an overage rate.

Define the feature

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "monthly_token_budget",
> "name": "Monthly Token Budget",
> "type": "NUMBER"
> }'

Assign per plan: Starter → 1000000, Pro → 10000000.

Check usage and budget before each request

Query both the entitlement (the limit) and the current meter usage (what they’ve consumed so far):

$# Get the entitlement value — GET /v1/customers/{ref}/entitlements
$curl "https://test.api.solvimon.com/v1/customers/acme-corp/entitlements" \
> -H "X-API-KEY: <apiKey>"
$
$# Get current period usage — GET /v1/ingest/meter-data
$curl "https://test.api.solvimon.com/v1/ingest/meter-data?customer_reference=acme-corp&meter_reference=completion_tokens" \
> -H "X-API-KEY: <apiKey>"

Compare the two values. If usage ≥ budget, block the request and prompt the customer to upgrade.

📘 You don’t need to query on every single request. Cache the values and refresh them periodically (e.g., every 60 seconds), or refresh after every N requests. The right trade-off depends on your acceptable overage tolerance.


Model access tiers

Control which models are available on each plan using an ENUM type feature.

Define the feature

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "available_models",
> "name": "Available Models",
> "type": "ENUM",
> "enum_values": [
> "gpt-4o",
> "gpt-4o-mini",
> "claude-3-5-sonnet",
> "claude-3-5-haiku"
> ]
> }'

Assign per plan

  • Starter: ["gpt-4o-mini", "claude-3-5-haiku"]
  • Pro: ["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet", "claude-3-5-haiku"]

Enforce at runtime

Query GET /v1/customers/{ref}/entitlements:

$curl "https://test.api.solvimon.com/v1/customers/acme-corp/entitlements" \
> -H "X-API-KEY: <apiKey>"
1{
2 "entitlements": [
3 {
4 "feature_reference": "available_models",
5 "enums": ["gpt-4o-mini", "claude-3-5-haiku"]
6 }
7 ]
8}

If the customer requests a model not in enums, return a 403 with a message indicating which plan tier includes that model.


Feature flags (SWITCH type)

Use SWITCH features for binary capabilities: priority queue access, early access to new models, dedicated support.

$curl -X POST https://test.api.solvimon.com/v1/features \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "reference": "priority_queue",
> "name": "Priority Queue",
> "type": "SWITCH"
> }'

Assign: Starter → false, Pro → true.

In your application:

1{
2 "feature_reference": "priority_queue",
3 "switch": true
4}

Route requests to a high-priority queue if switch: true, otherwise to the standard queue.


Overriding entitlements per customer

Enterprise customers often negotiate custom limits. You can override the plan default for a specific customer directly on their subscription without changing the pricing plan.

Use PATCH /v1/pricing-plan-subscriptions/{id} to set custom entitlement values on the subscription:

$curl -X PATCH https://test.api.solvimon.com/v1/pricing-plan-subscriptions/<subscription_id> \
> -H "X-API-KEY: <apiKey>" \
> -H "Content-Type: application/json" \
> -d '{
> "entitlements": [
> {
> "feature_reference": "monthly_token_budget",
> "number": "100000000",
> "override": true
> }
> ]
> }'

Setting override: true means the subscription value takes precedence over the pricing plan default. The customer stays on their existing plan — only their entitlement values change.

📘 Via Desk: Customers → select customer → Subscriptions → select subscription → Entitlements. Desk shows the plan defaults and lets you set overrides per customer.


Summary

Feature typeUse caseEnforced by
NUMBERToken budget, rate limit, max file sizeYour application
ENUMModel access tier, available regionsYour application
SWITCHPriority queue, beta features, dedicated supportYour application
AMOUNTDollar credit allowanceYour application

All entitlement enforcement happens in your code. Solvimon provides a single endpoint to query what a customer is entitled to — your application decides what to do with that information.