Most AI integrations stop at chat. You wire up an LLM, stream some tokens to a text box, and call it a day. But what if you want agents that actually do things — send emails, query databases, book meetings, delegate work to other agents?
I built a project I call Jarvis to find out: a platform where specialized AI agents use real tools to perform real tasks, all built on Ruby on Rails.
User Experience
I ask Jarvis to do multi-step tasks, and it just does it. I can type “Hey Jarvis… think up 20 domain names that start with the letter X, check their availability, then email me the available ones.” Jarvis breaks this task down into atomic units, and its orchestrator agent plans and executes the whole process end-to-end, calling the right tools, skills, and sub-agents along the way. The next thing I know I have a list of available domain names in my email inbox. Here’s how I built it.
The Stack
Jarvis runs on Rails 8 with RubyLLM handling all LLM interactions. RubyLLM gives us three ActiveRecord mixins that make persistence nearly free:
acts_as_chat— a conversation with model, system prompt, and tool configurationacts_as_message— individual messages within a chatacts_as_tool_call— records of every tool invocation the LLM makes
Real-time streaming comes from Hotwire (Turbo Streams over Solid Cable), and background jobs run on Solid Queue. No Redis, no Node, no Webpack. Just Rails.
Agents — Specialized Workers
Every agent in Jarvis subclasses BaseAgent and implements two methods: #system_prompt (what the agent knows and how it behaves) and #tools (what it can do).
class BaseAgent
attr_reader :chat
def initialize(chat)
@chat = chat
end
def run(user_message)
if chat
setup_chat
chat.ask(user_message).content
else
llm = RubyLLM.chat(model: DEFAULT_MODEL)
llm.with_instructions(full_system_prompt)
llm.with_tools(*tools_for_chat)
llm.ask(user_message).content
end
end
# Subclasses implement these:
def system_prompt = raise NotImplementedError
def tools = raise NotImplementedError
end
The key design decision: Each conversation begins by talking to The Orchestrator, who then delegates tasks to specialized sub-agents. The Email Agent can send emails. The Research Agent can search the web and query databases. The Scheduling Agent can check calendars and book meetings. No agent gets every tool — this constrains behavior and makes task delegation much more predictable.
Here’s sample code from The Orchestrator, the entry point for all user requests:
class OrchestratorAgent < BaseAgent
def system_prompt
<<~PROMPT
You are the orchestrator agent. You analyze user requests and either
handle them directly or delegate to specialist agents.
Available specialists:
- ResearchAgent: web search, PDF analysis, database queries
- DataAnalysisAgent: PDF analysis and database queries
- SchedulingAgent: calendar management, meeting negotiation, booking
PROMPT
end
def skills
[InterviewSkill, SentimentAnalysisSkill]
end
def tools
[DelegateToAgentTool, EmailTool, SmsTool, SendTelegramMessageTool,
DomainCheckerTool, WeatherForecastTool, AskUserTool,
InterviewUserTool, CreateEmailMonitorTool]
end
end
Tools — What Agents Can Actually Do
Tools are where agents meet the real world. Each tool subclasses RubyLLM::Tool, declares its parameters, and implements #execute. The convention: return data hashes on success, { error: "..." } for recoverable failures. Never raise for expected errors — let the agent handle it gracefully.
class EmailTool < RubyLLM::Tool
description "Sends an email via the application mailer."
param :to, desc: "Recipient email address"
param :subject, desc: "Email subject line"
param :body, desc: "Plain text email body"
def execute(to:, subject:, body:)
unless to.match?(/\A[^@\s]+@[^@\s]+\z/)
return { error: "Invalid email address: #{to}" }
end
AgentMailer.agent_email(to: to, subject: subject, body: body).deliver_later
{ sent: true, to: to, subject: subject }
rescue => e
{ error: "Failed to send: #{e.message}" }
end
end
That’s the entire email tool. param declarations tell the LLM what arguments to provide, and RubyLLM handles the function-calling protocol automatically. The LLM sees the tool description, decides when to use it, and passes structured arguments. Your #execute method just does the work.
So far Jarvis has 19 tools spanning email, SMS, Telegram, web search, PDF reading, database queries (read-only against a few of my side-project Postgres databases), Google Calendar availability (I set it up so Jarvis can read my Google Calendar schedule but not yet write to it), domain name availibility lookups, weather, and more.
The Orchestrator — Delegation as a First-Class Primitive
IMO the most interesting tool is DelegateToAgentTool. When the orchestrator encounters a complex request — say, “research our competitor’s pricing and email me a summary” — it doesn’t try to do everything itself. It delegates.
class DelegateToAgentTool < RubyLLM::Tool
description "Delegates a subtask to another agent."
param :agent_type, desc: "Agent class name (e.g. ResearchAgent)"
param :message, desc: "The task or question to send to the sub-agent"
ALLOWED_AGENTS = %w[ResearchAgent DataAnalysisAgent SchedulingAgent].freeze
def execute(agent_type:, message:)
unless ALLOWED_AGENTS.include?(agent_type)
return { error: "Unknown agent: #{agent_type}" }
end
task = create_task(agent_type, message)
agent = agent_type.constantize.new(nil)
result = agent.run(message)
complete_task(task)
{ response: result }
rescue => e
fail_task(task, e.message)
{ error: "Agent failed: #{e.message}" }
end
end
Sub-agents run ephemerally — they don’t persist conversation history. They spin up, do their job, return a result, and disappear. But every delegation is tracked as a Task record, giving us full visibility into what happened.
This enables multi-step workflows. The orchestrator might delegate research to ResearchAgent, get the findings back, then delegate scheduling to SchedulingAgent based on those findings, then send an email with the results via the CommunicationAgent subagent, which is allowed to use SendEmailTool and SendSMSTool.
Skills — Pluggable Behavior
Sometimes you want to modify how an agent behaves without creating a new agent subclass. You also want these abilities to be modular so different agents can use them. That’s what skills are for. In Jarvis, a skill is a prompt mixin — extra instructions appended to the agent’s system prompt.
The orchestrator uses InterviewSkill (which teaches it to ask clarifying questions before acting) and SentimentAnalysisSkill (for routing inbound emails by tone). Skills compose — you can stack multiple skills on any agent without changing its core logic.
Real-Time UX with Hotwire
When a user sends a message, the controller enqueues an AgentRunJob and returns immediately. The job runs the agent in the background, streaming LLM tokens to the browser via Turbo Streams:
class AgentRunJob < ApplicationJob
def perform(chat_id)
chat = Chat.find(chat_id)
task = chat.tasks.create!(status: "queued", agent_type: chat.agent_type, ...)
task.update!(status: "running", started_at: Time.current)
chat.agent.continue do |chunk|
if chunk.content.present? && assistant_msg
assistant_msg.broadcast_append_chunk(chunk.content)
end
end
task.update!(status: "completed", completed_at: Time.current)
end
end
The user sees the response building character by character. Task status updates broadcast in real-time too — you can watch a task go from queued to running to completed as the agent works. No polling, no WebSocket boilerplate. Just Turbo.
Verifying Agent Behavior — Tool Visibility
Here’s something I think matters a lot: you should be able to see exactly which tools an agent used during a run. Not just what it said — what it did.
Every Task record reports exactly which tools were invoked during its execution window:
class Task < ApplicationRecord
def tool_names
return [] unless started_at
ToolCall.joins(:message)
.where(messages: { chat_id: chat_id })
.where(messages: { created_at: started_at..(completed_at || Time.current) })
.distinct
.pluck(:name)
end
end
This queries the ToolCall records (persisted automatically by RubyLLM’s acts_as_tool_call) that fall within the task’s time window. The task page renders these as badges:
<% if (tools = task.tool_names).any? %>
<div class="flex flex-wrap gap-1 mt-1">
<p class="text-xs">Tools used:</p>
<% tools.each do |name| %>
<span class="text-xs px-1.5 py-0.5 border">
<%= name.camelize %>
</span>
<% end %>
</div>
<% end %>
When you ask “research and email me available domain names” you can see on the task show page that the agent actually used DomainCheckerTool and SendEmailTool — it didn’t just make something up.
This is trust-but-verify for AI agents. The tool badges are a quick visual check: did the agent do what it claimed to do? If an agent says “I searched for that and found nothing” but WebSearchTool doesn’t appear in the tool list, something’s wrong. It’s a simple pattern, but tool use visibility drastically increases how much a human can trust AI Agent output.
Rails Fits Better Than You’d Think
There is a pretty natural mapping between Rails conventions and agent orchestration:
- Models are agents and tasks
- Jobs handle the async ai agent orchestration loop
- Tools are plain Ruby classes with a standard interface
- Turbo Streams give you real-time updates without a separate WebSocket layer
- ActiveRecord persistence comes free via RubyLLM’s mixins
You don’t need a separate orchestration framework, a vector database, or a complex pipeline system. Rails already has background jobs, WebSockets, database persistence, and a clean MVC structure. Convention over configuration turns out to apply to AI agents too.
The entire system is about 19 tools, 4 agents, a handful of skills, and one background job class that ties it all together. It’s not a specialized proprietary agentic framework, it’s just a normal Rails app that happens to orchestrate AI agents, and that’s kind of remarkable.
Jarvis was built with RubyLLM — a Ruby-first interface to modern LLMs with built-in tool calling, streaming, and Rails integration.