Jesse Waites, Senior Ruby on Rails Developer

As AI models become increasingly sophisticated, many website owners such as myself are concerned about their awesome blog posts being scraped and used to train AI systems without permission. If you’re running a Ruby on Rails application and want to prevent AI bots from crawling your site, implementing proper bot blocking measures is essential. The most straightforward method to block AI bots is through the common robots.txt file. This file tells web crawlers which parts of your site they should or shouldn’t access, as most reputable AI companies respect robots.txt directives.

Ruby on Rails automatically creates a public/robots.txt by default since at least Rails 3, so you should already have a robots.txt file in your public directory. This file will be automatically served at yoursite.com/robots.txt. Update your robots.txt file with the content below that blocks the major AI crawlers:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: *
Disallow:

This configuration blocks GPTBot: OpenAI’s web crawler used for training GPT models, ClaudeBot: Anthropic’s crawler for Claude AI training, CCBot: Common Crawl’s bot, often used for AI training datasets, Google-Extended: Google’s crawler for AI training (separate from regular search indexing), Bytespider: ByteDance’s crawler (TikTok’s parent company), and PerplexityBot which is Perplexity AI’s search crawler.

Important note: The final ‘User-agent: * with Disallow: (empty)’ part in my version of the robots.txt file is absolutely critical, as it allows all other bots to crawl your site normally, ensuring your SEO isn’t affected.

This robots.txt file technique works because most AI companies have committed to respecting these directives. While robots.txt isn’t legally binding, having it in place strengthens your position if you need to pursue legal action against unauthorized scraping.

July 2, 2025 admin