Free · No Sign-Up · AI Bot Blocking

Free Robots.txt Generator

Build a clean, SEO-friendly robots.txt file in seconds. Block AI bots like GPTBot, ClaudeBot, CCBot. Add sitemap reference. Free, no sign-up.

What is robots.txt and why does it matter?

Robots.txt is a small plain-text file placed at the root of your domain (always at /robots.txt) that tells search engine crawlers and other bots which parts of your site they should and should not access. It uses a simple, decades-old protocol called the Robots Exclusion Standard, and it is the first file most well-behaved crawlers fetch when they visit your site.

Unlike noindex meta tags (which control whether a page appears in search results) or canonical tags (which manage duplicate content), robots.txt controls crawling behavior at the URL level. Disallowing a path prevents crawlers from fetching it at all. This is useful for protecting your crawl budget on large sites, blocking aggressive scrapers, and keeping admin or staging URLs out of indexing pipelines.

Anatomy of a robots.txt file

A robots.txt file consists of one or more rule blocks. Each block targets specific crawlers and lists allow/disallow rules:

User-agent: *
Disallow: /admin/
Disallow: /cart/
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Key directives:

  • User-agent — specifies which crawler the rules apply to. * means all crawlers; Googlebot targets only Google's main bot.
  • Disallow — paths that should not be crawled. /admin/ blocks the admin folder; / blocks everything; empty value disallows nothing.
  • Allow — explicit permission, used to override broader Disallow rules.
  • Sitemap — full URL to your XML sitemap. You can list multiple Sitemap entries.
  • Crawl-delay — how many seconds to wait between requests. Most modern bots ignore this.

Should you block AI bots like GPTBot and ClaudeBot?

Since 2023, a wave of AI-focused crawlers have appeared. The major ones include:

  • GPTBot (OpenAI / ChatGPT)
  • ClaudeBot (Anthropic / Claude)
  • CCBot (Common Crawl, used to train many models)
  • Google-Extended (controls whether Google trains Gemini and other AI products on your content)
  • Anthropic-AI and anthropic-ai (older Anthropic bot)
  • PerplexityBot (Perplexity)
  • Bytespider (ByteDance / TikTok)
  • FacebookBot and Meta-ExternalAgent (Meta AI training)

Whether to block these is a strategic question, not a technical one. Reasons to block:

  1. You believe AI training without consent is unfair use of your content
  2. You publish original journalism or research and want to control how it is used
  3. You sell content (courses, books, premium articles) and AI summarization undermines your business

Reasons to allow:

  1. You want your content to appear in AI-powered search results (ChatGPT search, Perplexity, etc.)
  2. You believe the future of search is AI-mediated and visibility there matters
  3. Your content is informational and you want maximum reach

There is no right answer — it is a publisher choice. Our generator gives you a one-click toggle to block all major AI bots if you choose to.

Common robots.txt patterns

Here are robots.txt patterns we use on different types of sites:

Standard blog or content site

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /search/

Sitemap: https://yoursite.com/sitemap.xml

E-commerce store

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /*?orderby=
Disallow: /*?filter_
Disallow: /*?add-to-cart=

Sitemap: https://store.com/sitemap.xml

Maximum lockdown (block all AI crawlers)

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Where to put robots.txt

The file must be placed at the absolute root of your domain. For example:

  • https://yoursite.com/robots.txt — correct
  • https://yoursite.com/seo/robots.txt — wrong, ignored by crawlers
  • https://www.yoursite.com/robots.txt — correct (different domain — needs its own file)

If you have https://yoursite.com and https://www.yoursite.com serving different content (or even just redirecting to each other), each one needs its own robots.txt accessible at its root.

Testing your robots.txt

After uploading, verify with these tools:

  1. Browser test — Visit https://yourdomain.com/robots.txt. The file should load as plain text.
  2. Google Search Console — Use the URL Inspection tool to verify Googlebot can fetch your robots.txt and understands the rules.
  3. Bing Webmaster Tools — Bing has its own robots.txt tester under SEO Reports.

Common robots.txt mistakes

  • Blocking CSS or JS — Modern Google crawlers render pages like browsers. Blocking /css/ or /js/ can break rendering and hurt rankings. Allow these.
  • Using Disallow to remove already-indexed pages — Disallow only prevents future crawling. To actually remove indexed pages, use noindex meta tags or the GSC removal tool.
  • Trailing slash mismatchesDisallow: /admin blocks /admin AND /admin/anything. Disallow: /admin/ only blocks paths starting with /admin/. Be precise.
  • Accidentally blocking everythingDisallow: / on a production site is catastrophic. Always double-check before uploading.
  • Forgetting the Sitemap line — adding Sitemap: https://yoursite.com/sitemap.xml tells crawlers where to find your sitemap automatically.

Generate yours above

Use the form above to build a robots.txt file with your specific rules. The generator includes one-click options for blocking AI bots, common admin paths, search/cart URL patterns, and the sitemap reference. Copy the generated text into a file named exactly robots.txt and upload it to your site root.

FAQ

Frequently asked questions

At the root of your domain — https://yoursite.com/robots.txt. Must be exactly that location and named robots.txt (lowercase) for crawlers to find it.

No. Disallow prevents future crawling but does not remove already-indexed pages. To remove pages, use noindex meta tags or the Search Console URL removal tool.

Your choice. Many publishers block them to prevent unauthorized training data scraping. Our generator includes a one-click option to block GPTBot, ClaudeBot, CCBot, and Google-Extended.

Yes. List one Sitemap: line per sitemap or sitemap index URL. Crawlers will read all of them.

Yes — even an empty one or one that says "User-agent: * Allow: /" is better than nothing. It signals you have considered crawl behavior.

Crawl-delay tells crawlers to wait N seconds between requests. Most modern bots (including Google) ignore it. Only set if your server is being overwhelmed.

Yes. Modern Googlebot renders pages like browsers. Blocking /css/ or /js/ can break rendering and tank rankings. Always allow these.

Robots.txt restricts what crawlers cannot access. A sitemap is a positive list of what they should crawl. They work together: robots.txt blocks junk, sitemap promotes what matters.