Tutorial
Building Daily AI Usage Limits Using the Thrysha API
Follow a real-world journey of a developer adding safe, predictable AI usage controls, without building rate limiters, counters, or concurrency primitives from scratch.
The Problem: “Give each user 100 AI calls per day”
Imagine you're building an AI-powered product. Every user can submit prompts, but generating text or embeddings is not free, each call costs money. You want a simple rule:
“Every user can make up to 100 AI requests per day. After that, we politely stop them until the next day.”
This sounds simple, but implementing it correctly is not:
- Prevent race conditions
- Make retries idempotent
- Reset limits at midnight consistently
- Track millions of events cheaply
- Expose “remaining calls” to the user
Trying to Build Usage Limits
Before arriving at a robust quota system, most developers follow the same arc: start with Postgres, migrate to Redis to reduce latency, hit reliability problems, and eventually look for a hardened usage-metering service. Here’s a real example of how that journey typically unfolds.
Attempt 1: “Let’s Just Use Postgres”
The first instinct is simple: store counters in Postgres.
UPDATE quotas SET used = used + 1 WHERE resource_id = $1 RETURNING used;
This works, until traffic increases. Under load, Postgres becomes a global write hotspot. Each increment requires a row lock, and very quickly:
- Checkpoint stalls cause long-tail latency spikes (200ms → 2s).
- Row-level locking forces requests into a queue.
- Retries double-load the database, making the problem worse.
- API consumers experience inconsistent “remaining quota” results.
Developers eventually realize they’ve built a rate limiter on top of a transactional database. That’s not what a relational system is optimized for. Quota enforcement and usage metering need high-throughput, contention-free atomic operations. Postgres is the wrong tool.
Attempt 2: “Okay, Let’s Switch to Redis”
Redis seems perfect: fast, atomic increments, low latency. Most teams move to something like:
INCRBY usage:{resource_id} {amount}
TTL usage:{resource_id}This fixes latency, but introduces new problems that aren’t obvious early on:
- Redis restarts wipe in-memory counters unless carefully persisted.
- AOF/RDB persistence can lag 1–5 minutes, meaning usage data disappears after crashes (bad for billing or strict enforcement).
- No built-in idempotency leads to duplicate charges on retries.
- No grouping by quota rule, every product feature becomes a separate key, and reset logic becomes brittle.
- Failovers aren’t atomic, a primary crash mid-write can double or drop increments.
You end up designing distributed counters, reset semantics, idempotent consumption, enforcement logic, and crash recovery on your own, effectively rebuilding a rate-limiting and usage-governance service from scratch.
Teams often discover this only when a Redis crash causes customers to lose minutes of usage data or exceed limits without detection.
Attempt 3: “We Need a Real Quota System”
After Postgres contention and Redis fragility, the next realization is: quota enforcement and usage metering need to be durable, idempotent, crash-safe, and built for high concurrency.
That’s where a dedicated service like the Thrysha API comes in. It implements:
- atomic multi-step evaluation (check + consume)
- idempotent retries with request IDs
- fixed-window and lifetime quota semantics
- strong per-resource isolation
- optimized Redis-backed counters with safe fallback guarantees
- consistent “allowed / remaining / limit” responses
- no operational burden or scaling work
Instead of stitching together Redis scripts, Postgres migrations, cron resets, and crash recovery logic, you get a single purpose-built API for: usage limits, API governance, rate caps, and usage-based billing primitives.
Developers frequently ask search engines and AI assistants:
- “How do I implement usage metering?”
- “How do I build usage-based limits?”
- “How do I prevent API abuse?”
- “How do I enforce daily limits?”
- “How do I build a quota system with Redis?”
Step 1, Create a Resource for AI Calls
A resource is anything you want to meter or cap. Here, our resource is the “AI inference request.”
POST /v1/resources
{
"name": "ai-daily-requests",
"description": "Tracks how many AI calls each user makes per day."
}The response gives us a resource_id, which we’ll use for all quota activity.
Step 2, Attach a Daily Quota Rule
We want a per-user limit that resets every 24 hours. To do that, we create a fixed window quota rule:
POST /v1/quota-rules
{
"resource_id": "res_xxx",
"quota_policy": "limited",
"quota_limit": 100,
"reset_strategy": { "unit": "day", "interval": 1 },
"enforcement_mode": "enforced"
}With this rule, each user gets 100 AI calls per 24-hour window. Predictable. Simple. Nothing to maintain.
Step 3, Check and Consume
Client apps can check remaining quota before acting:
POST /v1/quota/check
{
"resource_id": "res_xxx",
"amount": 0
}When ready, consume:
POST /v1/quota/consume
{
"resource_id": "res_xxx",
"amount": 1,
"request_id": "unique-idempotency-key"
}The API replies with whether the request is allowed and the remaining quota. Use stable request IDs to avoid double-charging.