Configuring entitlements for AI products
Entitlements define what a customer is allowed to do on their plan — which models they can access, how many tokens they can consume per month, and whether premium features are enabled. This guide covers how to configure and enforce entitlements for AI products.
How entitlements work
An entitlement is a feature value assigned to a customer via their pricing plan or subscription. Features are defined once and then assigned per plan tier with different values.
Solvimon stores the entitlement values and makes them queryable. Your application is responsible for reading those values and enforcing them — Solvimon doesn’t block requests at runtime.
The pattern is:
- Define features in Solvimon (once)
- Assign feature values to pricing plan versions (per tier)
- Before executing a customer request, query their entitlements and meter usage
- Allow or block the request based on what you find
Rate limits
Rate limits (requests per minute, requests per day) are a common entitlement for AI APIs.
Define the feature
Use POST /v1/features:
Assign different values per plan
When configuring a pricing plan version in Desk or via the API, set the requests_per_minute entitlement to:
- Starter:
60 - Pro:
600 - Enterprise:
6000
Enforce at runtime
Query the customer’s entitlements via GET /v1/customers/{ref}/entitlements before accepting a request:
Cache this value at session start (e.g., when a user authenticates). Enforce it in your rate-limiting middleware using a token bucket or sliding window counter in Redis or similar.
📘 Solvimon provides the entitlement value. Rate limit enforcement — counting requests, maintaining state, returning
429— happens in your application.
Monthly token budgets
A monthly token budget limits how many tokens a customer can consume per billing period before being blocked or charged an overage rate.
Define the feature
Assign per plan: Starter → 1000000, Pro → 10000000.
Check usage and budget before each request
Query both the entitlement (the limit) and the current meter usage (what they’ve consumed so far):
Compare the two values. If usage ≥ budget, block the request and prompt the customer to upgrade.
📘 You don’t need to query on every single request. Cache the values and refresh them periodically (e.g., every 60 seconds), or refresh after every N requests. The right trade-off depends on your acceptable overage tolerance.
Model access tiers
Control which models are available on each plan using an ENUM type feature.
Define the feature
Assign per plan
- Starter:
["gpt-4o-mini", "claude-3-5-haiku"] - Pro:
["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet", "claude-3-5-haiku"]
Enforce at runtime
Query GET /v1/customers/{ref}/entitlements:
If the customer requests a model not in enums, return a 403 with a message indicating which plan tier includes that model.
Feature flags (SWITCH type)
Use SWITCH features for binary capabilities: priority queue access, early access to new models, dedicated support.
Assign: Starter → false, Pro → true.
In your application:
Route requests to a high-priority queue if switch: true, otherwise to the standard queue.
Overriding entitlements per customer
Enterprise customers often negotiate custom limits. You can override the plan default for a specific customer directly on their subscription without changing the pricing plan.
Use PATCH /v1/pricing-plan-subscriptions/{id} to set custom entitlement values on the subscription:
Setting override: true means the subscription value takes precedence over the pricing plan default. The customer stays on their existing plan — only their entitlement values change.
📘 Via Desk: Customers → select customer → Subscriptions → select subscription → Entitlements. Desk shows the plan defaults and lets you set overrides per customer.
Summary
All entitlement enforcement happens in your code. Solvimon provides a single endpoint to query what a customer is entitled to — your application decides what to do with that information.