What is a robots.txt file and why do I need one?

A robots.txt file is a plain text file placed at the root of your domain that tells web crawlers which parts of your site to access or avoid. It follows the Robots Exclusion Protocol. While not mandatory, it is strongly recommended for any live site with more than a few pages — it helps protect admin areas, conserves crawl budget, and lets you point crawlers to your sitemap.

Robots.txt Generator | SEO HQ – Free Robots File Builder

Global Settings

Website Domain ?

Sitemap URL ?

Options

Block AI Bots ? Block Bad Bots ? Add Comments Add Timestamp

Quick Presets

Apply a ready-made rule set for your platform. Clicking a preset adds a new rule group with recommended Disallow paths.

User-Agent Rule Groups 0 groups

Generated robots.txt

File Summary

Rule Groups

Add a group to begin

Total Directives

—

Validation

—

Directive Breakdown

Disallow

Allow

Groups

Validation Checks

Quick Tips

PLACEMENT → Must live at /robots.txt root
→ One file per domain / subdomain
→ Case-sensitive on Linux servers

KEY RULES → * wildcard matches all bots
→ Disallow: / blocks everything
→ Empty Disallow means allow all
→ More specific rule wins ties

REMEMBER → robots.txt ≠ noindex
→ Use noindex tag to deindex
→ Always reference your sitemap

Step-by-Step Guide

How to Use the
Robots.txt Generator

Enter Your Domain

Type your root domain into the Domain field. This auto-generates the Sitemap URL reference and validates that all paths belong to the correct origin.

Apply a Preset

Choose a CMS preset — WordPress, Shopify, Next.js, or others — to instantly add a recommended set of Disallow rules tailored for that platform.

Add Rule Groups

Click Add Rule Group to create a user-agent block. Use * for all bots, or enter a specific crawler like Googlebot, Bingbot, or GPTBot.

Set Allow & Disallow

Add Disallow paths to block crawler access and Allow paths to carve out exceptions within blocked directories. All paths must start with /.

Toggle Options

Enable AI bot blocking, bad bot blocking, timestamps, and comments in Global Settings. The live preview and raw text update instantly with every change.

Download & Deploy

Download your robots.txt file and upload it to the root of your domain. Verify it's accessible at yoursite.com/robots.txt before submitting your sitemap.

Common Questions

Frequently Asked
Questions

What is a robots.txt file and do I need one? +

A robots.txt file is a plain text file placed at the root of your domain that tells web crawlers which parts of your site to access or avoid. It follows the Robots Exclusion Protocol and is the first file most bots request when visiting a site. While not strictly required, it is strongly recommended for any live website — it protects admin areas, conserves crawl budget, and signals where your sitemap lives.

Does Disallow in robots.txt stop a page from appearing in search results? +

Not reliably. Disallowing a URL prevents crawlers from visiting it, but Google may still index and show the page in search results if it discovers the URL through external links or other signals. To reliably prevent a page from appearing in search results, add a noindex meta tag or an X-Robots-Tag HTTP header directly to the page — not robots.txt.

What is the difference between Allow and Disallow? +

Disallow tells a crawler it cannot access a given path. Allow explicitly grants access to a path that would otherwise be blocked by a broader Disallow rule. For example, Disallow: /private/ blocks the whole directory, while Allow: /private/status.html permits that specific file. When both rules apply to the same path, the more specific (longer) rule wins. If two rules have equal length, Allow takes precedence for Google.

Should I block AI training crawlers in robots.txt? +

That depends on your preferences and content strategy. Bots like GPTBot (OpenAI), ClaudeBot (Anthropic), CCBot (Common Crawl), and Google-Extended crawl web content to build AI training datasets. You can block any or all of them by adding Disallow: / under their specific user-agent name. Blocking AI crawlers does not affect your search engine rankings but prevents your content from being used to train AI models.

What is crawl-delay and should I use it? +

Crawl-delay instructs a bot to wait a set number of seconds between successive requests to your server, which can reduce strain from aggressive crawlers. Note that Googlebot ignores Crawl-delay entirely — use the crawl rate control in Google Search Console instead. Bing, Yandex, and many other crawlers do respect it. Only set a crawl delay if you have a specific server load reason to do so.

Where must the robots.txt file be placed? +

The robots.txt file must be accessible at the exact path /robots.txt at the root of your domain — for example https://yoursite.com/robots.txt. It cannot be placed in a subdirectory. Each subdomain requires its own robots.txt file at its own root. After uploading, always verify it loads correctly in a browser and returns an HTTP 200 status code before relying on it.

About This Tool

What is a
Robots.txt Generator?

The Tool

The SEO HQ Robots.txt Generator lets you build a fully valid robots.txt file without writing a line of code. Create multiple user-agent rule groups, add Allow and Disallow paths with real-time validation, apply CMS-specific presets, toggle AI bot blocking, and reference your sitemap — all with a live syntax-highlighted preview that updates instantly.

Why It Matters

A correctly configured robots.txt is your first line of crawl control. It protects sensitive paths from being crawled, prevents duplicate content from consuming crawl budget, and signals to all bots where your sitemap lives. A poorly written file can inadvertently block critical pages or leave admin sections accessible to every crawler on the internet.

Key Features

Multiple user-agent rule groups
Allow and Disallow path directives per group
CMS presets: WordPress, Shopify, Next.js, and more
AI crawler and bad bot blocking toggles
Live syntax-highlighted preview with raw text view
One-click download as robots.txt or copy to clipboard

Best Practices

Always include a Sitemap: reference line
Use * to set baseline rules for all bots
Never block CSS, JS, or images from Googlebot
Test with Google's robots.txt Tester in Search Console
Review and update after every major site restructure
Use noindex — not robots.txt — to prevent indexation

Robots.txtGenerator

How to Use theRobots.txt Generator

Frequently AskedQuestions

What is aRobots.txt Generator?