The Inference Pattern: Tracking AI Usage with Polymorphic Models in Rails

As AI-powered features move from “cool experiment” to “core production feature,” the “how much is this costing me?” question becomes a daily concern. While most providers offer their own dashboards, jumping between three or four different portals to check usage is a headache.

In my recent projects, I’ve moved away from external trackers and leaned back into the Rails Way by implementing a polymorphic Inference model.

Why Polymorphic?

The AI landscape is fragmented. Today you’re using OpenAI for text; tomorrow you’re using Gemini for multimodal tasks or a self-hosted Llama instance. By making the Inference model polymorphic, you can attach an “inference” to any record in your system—a User, a Conversation, or even a specific BackgroundJob.

The Setup

I treat an Inference as a ledger entry. Whenever a service is called, I wrap the response and write it to the database:

# The core model
class Inference < ApplicationRecord
  belongs_to :inferable, polymorphic: true

  # Tracks provider (OpenAI), model (gpt-4o),
  # input_tokens, output_tokens, and latency
end

By keeping this data in-house rather than siloed in a third-party tool, we get all the standard Rails benefits and activerecord awesomeness.

Native Charting

Using a gem like groupdate, I can spin up a production admin dashboard in minutes to see daily token burns.

Business Context: I can easily run queries like User.find(1).inferences.sum(:cost) to see which specific customers are my “power users” (and perhaps my most expensive ones).

Performance Monitoring:

By tracking latency in the model, I can trigger alerts if a specific provider’s response times start to spike.

Verdict

If you are building an AI-heavy Rails app, don’t overcomplicate your telemetry. Use the tools you already have. A simple polymorphic table gives you the visibility you need without the overhead of another SaaS subscription.