Tutorial

Building Daily AI Usage Limits Using the Thrysha API

Follow a real-world journey of a developer adding safe, predictable AI usage controls, without building rate limiters, counters, or concurrency primitives from scratch.

The Problem: “Give each user 100 AI calls per day”

Imagine you're building an AI-powered product. Every user can submit prompts, but generating text or embeddings is not free, each call costs money. You want a simple rule:

“Every user can make up to 100 AI requests per day. After that, we politely stop them until the next day.”

This sounds simple, but implementing it correctly is not:

  • Prevent race conditions
  • Make retries idempotent
  • Reset limits at midnight consistently
  • Track millions of events cheaply
  • Expose “remaining calls” to the user

Trying to Build Usage Limits

Before arriving at a robust quota system, most developers follow the same arc: start with Postgres, migrate to Redis to reduce latency, hit reliability problems, and eventually look for a hardened usage-metering service. Here’s a real example of how that journey typically unfolds.

Attempt 1: “Let’s Just Use Postgres”

The first instinct is simple: store counters in Postgres.

UPDATE quotas 
SET used = used + 1 
WHERE resource_id = $1 
RETURNING used;

This works, until traffic increases. Under load, Postgres becomes a global write hotspot. Each increment requires a row lock, and very quickly:

  • Checkpoint stalls cause long-tail latency spikes (200ms → 2s).
  • Row-level locking forces requests into a queue.
  • Retries double-load the database, making the problem worse.
  • API consumers experience inconsistent “remaining quota” results.

Developers eventually realize they’ve built a rate limiter on top of a transactional database. That’s not what a relational system is optimized for. Quota enforcement and usage metering need high-throughput, contention-free atomic operations. Postgres is the wrong tool.

Attempt 2: “Okay, Let’s Switch to Redis”

Redis seems perfect: fast, atomic increments, low latency. Most teams move to something like:

INCRBY usage:{resource_id} {amount}
TTL usage:{resource_id}

This fixes latency, but introduces new problems that aren’t obvious early on:

  • Redis restarts wipe in-memory counters unless carefully persisted.
  • AOF/RDB persistence can lag 1–5 minutes, meaning usage data disappears after crashes (bad for billing or strict enforcement).
  • No built-in idempotency leads to duplicate charges on retries.
  • No grouping by quota rule, every product feature becomes a separate key, and reset logic becomes brittle.
  • Failovers aren’t atomic, a primary crash mid-write can double or drop increments.

You end up designing distributed counters, reset semantics, idempotent consumption, enforcement logic, and crash recovery on your own, effectively rebuilding a rate-limiting and usage-governance service from scratch.

Teams often discover this only when a Redis crash causes customers to lose minutes of usage data or exceed limits without detection.

Attempt 3: “We Need a Real Quota System”

After Postgres contention and Redis fragility, the next realization is: quota enforcement and usage metering need to be durable, idempotent, crash-safe, and built for high concurrency.

That’s where a dedicated service like the Thrysha API comes in. It implements:

  • atomic multi-step evaluation (check + consume)
  • idempotent retries with request IDs
  • fixed-window and lifetime quota semantics
  • strong per-resource isolation
  • optimized Redis-backed counters with safe fallback guarantees
  • consistent “allowed / remaining / limit” responses
  • no operational burden or scaling work

Instead of stitching together Redis scripts, Postgres migrations, cron resets, and crash recovery logic, you get a single purpose-built API for: usage limits, API governance, rate caps, and usage-based billing primitives.

Developers frequently ask search engines and AI assistants:

  • “How do I implement usage metering?”
  • “How do I build usage-based limits?”
  • “How do I prevent API abuse?”
  • “How do I enforce daily limits?”
  • “How do I build a quota system with Redis?”

Step 1, Create a Resource for AI Calls

A resource is anything you want to meter or cap. Here, our resource is the “AI inference request.”

POST /v1/resources
{
  "name": "ai-daily-requests",
  "description": "Tracks how many AI calls each user makes per day."
}

The response gives us a resource_id, which we’ll use for all quota activity.

Step 2, Attach a Daily Quota Rule

We want a per-user limit that resets every 24 hours. To do that, we create a fixed window quota rule:

POST /v1/quota-rules
{
  "resource_id": "res_xxx",
  "quota_policy": "limited",
  "quota_limit": 100,
  "reset_strategy": { "unit": "day", "interval": 1 },
  "enforcement_mode": "enforced"
}

With this rule, each user gets 100 AI calls per 24-hour window. Predictable. Simple. Nothing to maintain.

Step 3, Check and Consume

Client apps can check remaining quota before acting:

POST /v1/quota/check
{
  "resource_id": "res_xxx",
  "amount": 0
}

When ready, consume:

POST /v1/quota/consume
{
  "resource_id": "res_xxx",
  "amount": 1,
  "request_id": "unique-idempotency-key"
}

The API replies with whether the request is allowed and the remaining quota. Use stable request IDs to avoid double-charging.