Jesse Waites, Senior Ruby on Rails Developer

People are starting to figure out that modern Rails, specifically Hotwire and Turbo Streams, is the perfect choice for building web and mobile apps that integrate with AI models. Thanks to the hotwire tools, integrating generative AI into your Ruby on Rails application has never been easier—or more powerful. With tools like Ollama, Rails’ built-in Hotwire features, and some clever orchestration, you can bring real-time AI-generated insights directly into your app’s user interface.

How is this possible? Turbo Streams. They enable your app to update portions of the page live as data becomes available—making them a perfect match for streaming large language model (LLM) responses, which are typically generated token by token, in real time as they are generated by the AI model.

Why Use Turbo Streams for LLM Responses?

LLMs like LLaMA, GPT, and others generate responses incrementally via a token based system. Instead of waiting for the entire message to be complete before showing it to users, Turbo Streams allow your Rails app to display the response as it’s generated, keeping your UI fast, engaging, and responsive.

This stream-first approach enhances user experience dramatically—users see the AI working in real time, similar to a chatbot interface. And unlike traditional AJAX polling, Turbo Streams handle this with less complexity and greater efficiency, thanks to Rails’ native support for WebSockets via Action Cable.

A Seamless Flow for Generative AI in Rails

In this setup, your Rails app provides a way for users to request insights (for example, from product data). When the user submits a request, a background job queues up and starts sending the conversation history to the AI model.

As the AI begins responding, the backend streams the text back chunk by chunk via Turbo Streams, updating the page in real time—no refreshes, no spinning wheels.

Benefits of This Approach

Real-Time UX: Users don’t wait for the full response to display—they see it build word by word.
Minimal JavaScript: With Turbo Streams, most of the client-side complexity is handled natively by Rails.
Scalable: The heavy lifting happens in background jobs, so your main app remains responsive.
Flexible Output: Responses can be styled or formatted using markdown or other frontend libraries for rich presentation.

Use Cases That Shine

Data-driven dashboards that ask LLMs for optimization tips based on live metrics
Customer support tools that provide instant suggestions based on inquiry context
E-commerce platforms offering automated product recommendations or inventory advice

Bringing It Together

With a structured system prompt focused on actionability, and a robust stream-processing job to handle output, your Rails app can become an intelligent assistant that actually helps users make better decisions—right when they need it.

Pairing Turbo Streams with LLMs turns Rails from a traditional web app into a smart, real-time interface powered by generative AI. And best of all, it’s built using tools Rails developers already know and love.

June 3, 2025 admin