OnlineTools4Free TeamThe OnlineTools4Free Team

The team behind OnlineTools4Free — building free, private browser tools.

Published Feb 4, 2026 · 8 min read · Reviewed by OnlineTools4Free

Robots.txt Complete Guide: Control How Search Engines Crawl Your Site

SEOby OnlineTools4Free TeamFebruary 4, 20268 min read

What is Robots.txt?

The robots.txt file is a plain text file at the root of your website that tells search engine crawlers which pages they are allowed to access and which they should skip. It lives at https://yourdomain.com/robots.txt and is one of the first files crawlers check when they visit your site.

The file follows the Robots Exclusion Protocol, a standard from 1994 that has been respected by all major search engines ever since. Google, Bing, Yandex, and others all read and honor robots.txt directives.

Important: robots.txt is a suggestion, not a security measure. Well-behaved crawlers (search engines) follow the rules. Malicious bots ignore them. Never use robots.txt to hide sensitive content — use authentication instead.

Basic Syntax

A robots.txt file consists of one or more rule blocks. Each block specifies a user-agent (the crawler) and a set of Allow or Disallow directives:

User-agent: * Disallow: /admin/ Disallow: /private/ Allow: /admin/public/ Sitemap: https://example.com/sitemap.xml

Key Directives

User-agent: Specifies which crawler the rules apply to. * means all crawlers. Googlebot targets Google specifically.
Disallow: Blocks the crawler from accessing the specified path. Disallow: / blocks everything. Disallow: (empty) allows everything.
Allow: Overrides a Disallow for a more specific path. Useful for allowing a subfolder within a blocked directory.
Sitemap: Tells crawlers where to find your XML sitemap. This is optional but recommended.
Crawl-delay: Asks the crawler to wait a specified number of seconds between requests. Respected by Bing and Yandex but ignored by Google.

Common Robots.txt Patterns

Block All Crawlers

User-agent: * Disallow: /

Use this for staging sites and development environments. You never want a staging site appearing in search results.

Allow Everything

User-agent: * Disallow:

Or simply an empty file. Most sites should allow full crawling — you want search engines to index your content.

Block Specific Directories

User-agent: * Disallow: /admin/ Disallow: /api/ Disallow: /tmp/ Disallow: /internal/

Block administrative areas, API endpoints, temporary files, and internal tools. These pages add no value in search results and waste your crawl budget.

Block Query Parameters

User-agent: * Disallow: /*?sort= Disallow: /*?filter= Disallow: /*?page=

Prevents crawlers from indexing paginated, sorted, or filtered versions of the same content. These duplicate pages dilute your SEO.

Different Rules for Different Crawlers

User-agent: Googlebot Disallow: /private/ User-agent: Bingbot Disallow: /private/ Crawl-delay: 5 User-agent: * Disallow: /

This allows Google and Bing to crawl most of your site while blocking all other crawlers entirely. The most specific matching user-agent block applies.

Generate these configurations automatically with our Robots.txt Generator.

Sitemap References

Adding a Sitemap directive to your robots.txt is one of the easiest SEO wins:

Sitemap: https://example.com/sitemap.xml Sitemap: https://example.com/sitemap-images.xml

You can list multiple sitemaps. The Sitemap directive can appear anywhere in the file — it is not tied to any User-agent block.

Submitting your sitemap through Google Search Console and Bing Webmaster Tools is also recommended, but the robots.txt reference ensures that any well-behaved crawler can discover your sitemap automatically.

Testing Your Robots.txt

A syntax error in robots.txt can block your entire site from search results. Always test before deploying:

Google Search Console: The "Robots.txt Tester" (under the legacy tools) lets you enter a URL and see whether your robots.txt blocks or allows it. It also highlights syntax errors.
Manual check: Navigate to https://yourdomain.com/robots.txt in your browser and verify the content is correct. Ensure the file is served as plain text (Content-Type: text/plain).
Staging test: Deploy robots.txt changes to a staging environment first and test with Google's URL Inspection tool before pushing to production.

Common testing scenarios to verify:

Is your homepage crawlable?
Are your product/content pages crawlable?
Are admin pages blocked?
Are duplicate content paths (sort, filter, pagination) blocked?
Is your sitemap discoverable?

Common Robots.txt Mistakes

Blocking CSS and JavaScript: Google needs to render your page to index it properly. Blocking CSS and JS files causes Google to see a broken page. Never put Disallow: /css/ or Disallow: /js/ in your robots.txt.
Leaving staging rules in production: The most catastrophic mistake. A Disallow: / left in your production robots.txt removes your entire site from search results. Always review robots.txt as part of your deployment checklist.
Wrong file location: Robots.txt must be at the domain root: /robots.txt. Files at /subfolder/robots.txt or /Robots.txt are not recognized.
Using robots.txt for noindex: Blocking a page via robots.txt prevents crawling but does not remove it from search results if other sites link to it. For true deindexing, use a noindex meta tag or HTTP header instead.
Overly broad rules: Disallow: /p blocks every URL starting with /p — including /products, /pricing, and /press. Be precise with your paths.
Forgetting trailing slashes: Disallow: /admin blocks /admin, /admin/, and /administrator. Disallow: /admin/ only blocks paths under /admin/. The trailing slash matters.

Create your robots.txt with our Robots.txt Generator. For more SEO guidance, see our articles on SEO meta tags, schema markup, and Open Graph images.

Robots.txt Generator

Create robots.txt files with user-agent rules, sitemaps, and crawl directives.

Try it free→

Go Deeper: Related Research

Our in-depth research reports cover this topic with benchmarks, data, and expert analysis.

The Complete Guide to SEO: Technical, Content & Link Building (2026)

27,000 words 60 min

Read the full analysis

Tools mentioned in this article are available in our Standard plan from $7.99/mo.

#robots-txt#seo#crawling#search-engines#indexing

OnlineTools4Free Team

The OnlineTools4Free Team

We are a small team of developers and designers building free, privacy-first browser tools. Every tool on this platform runs entirely in your browser — your files never leave your device.

View all articles by OnlineTools4Free→

OnlineTools4Free TeamThe OnlineTools4Free Team

The team behind OnlineTools4Free — building free, private browser tools.

Published Feb 4, 2026 · 8 min read · Reviewed by OnlineTools4Free

Robots.txt Complete Guide: Control How Search Engines Crawl Your Site

SEOby OnlineTools4Free TeamFebruary 4, 20268 min read

What is Robots.txt?

Basic Syntax

A robots.txt file consists of one or more rule blocks. Each block specifies a user-agent (the crawler) and a set of Allow or Disallow directives:

User-agent: * Disallow: /admin/ Disallow: /private/ Allow: /admin/public/ Sitemap: https://example.com/sitemap.xml

Key Directives

User-agent: Specifies which crawler the rules apply to. * means all crawlers. Googlebot targets Google specifically.
Disallow: Blocks the crawler from accessing the specified path. Disallow: / blocks everything. Disallow: (empty) allows everything.
Allow: Overrides a Disallow for a more specific path. Useful for allowing a subfolder within a blocked directory.
Sitemap: Tells crawlers where to find your XML sitemap. This is optional but recommended.
Crawl-delay: Asks the crawler to wait a specified number of seconds between requests. Respected by Bing and Yandex but ignored by Google.

Common Robots.txt Patterns

Block All Crawlers

User-agent: * Disallow: /

Use this for staging sites and development environments. You never want a staging site appearing in search results.

Allow Everything

User-agent: * Disallow:

Or simply an empty file. Most sites should allow full crawling — you want search engines to index your content.

Block Specific Directories

User-agent: * Disallow: /admin/ Disallow: /api/ Disallow: /tmp/ Disallow: /internal/

Block administrative areas, API endpoints, temporary files, and internal tools. These pages add no value in search results and waste your crawl budget.

Block Query Parameters

User-agent: * Disallow: /*?sort= Disallow: /*?filter= Disallow: /*?page=

Prevents crawlers from indexing paginated, sorted, or filtered versions of the same content. These duplicate pages dilute your SEO.

Different Rules for Different Crawlers

User-agent: Googlebot Disallow: /private/ User-agent: Bingbot Disallow: /private/ Crawl-delay: 5 User-agent: * Disallow: /

This allows Google and Bing to crawl most of your site while blocking all other crawlers entirely. The most specific matching user-agent block applies.

Generate these configurations automatically with our Robots.txt Generator.

Sitemap References

Adding a Sitemap directive to your robots.txt is one of the easiest SEO wins:

Sitemap: https://example.com/sitemap.xml Sitemap: https://example.com/sitemap-images.xml

You can list multiple sitemaps. The Sitemap directive can appear anywhere in the file — it is not tied to any User-agent block.

Testing Your Robots.txt

A syntax error in robots.txt can block your entire site from search results. Always test before deploying:

Google Search Console: The "Robots.txt Tester" (under the legacy tools) lets you enter a URL and see whether your robots.txt blocks or allows it. It also highlights syntax errors.
Manual check: Navigate to https://yourdomain.com/robots.txt in your browser and verify the content is correct. Ensure the file is served as plain text (Content-Type: text/plain).
Staging test: Deploy robots.txt changes to a staging environment first and test with Google's URL Inspection tool before pushing to production.

Common testing scenarios to verify:

Is your homepage crawlable?
Are your product/content pages crawlable?
Are admin pages blocked?
Are duplicate content paths (sort, filter, pagination) blocked?
Is your sitemap discoverable?

Common Robots.txt Mistakes

Blocking CSS and JavaScript: Google needs to render your page to index it properly. Blocking CSS and JS files causes Google to see a broken page. Never put Disallow: /css/ or Disallow: /js/ in your robots.txt.
Leaving staging rules in production: The most catastrophic mistake. A Disallow: / left in your production robots.txt removes your entire site from search results. Always review robots.txt as part of your deployment checklist.
Wrong file location: Robots.txt must be at the domain root: /robots.txt. Files at /subfolder/robots.txt or /Robots.txt are not recognized.
Using robots.txt for noindex: Blocking a page via robots.txt prevents crawling but does not remove it from search results if other sites link to it. For true deindexing, use a noindex meta tag or HTTP header instead.
Overly broad rules: Disallow: /p blocks every URL starting with /p — including /products, /pricing, and /press. Be precise with your paths.
Forgetting trailing slashes: Disallow: /admin blocks /admin, /admin/, and /administrator. Disallow: /admin/ only blocks paths under /admin/. The trailing slash matters.

Create your robots.txt with our Robots.txt Generator. For more SEO guidance, see our articles on SEO meta tags, schema markup, and Open Graph images.

Robots.txt Generator

Create robots.txt files with user-agent rules, sitemaps, and crawl directives.

Try it free→

Go Deeper: Related Research

Our in-depth research reports cover this topic with benchmarks, data, and expert analysis.

The Complete Guide to SEO: Technical, Content & Link Building (2026)

27,000 words 60 min

Read the full analysis

Tools mentioned in this article are available in our Standard plan from $7.99/mo.

#robots-txt#seo#crawling#search-engines#indexing

OnlineTools4Free Team

The OnlineTools4Free Team

We are a small team of developers and designers building free, privacy-first browser tools. Every tool on this platform runs entirely in your browser — your files never leave your device.

View all articles by OnlineTools4Free→

Robots.txt Complete Guide: Control How Search Engines Crawl Your Site

What is Robots.txt?

Basic Syntax

Key Directives

Common Robots.txt Patterns

Block All Crawlers

Allow Everything

Block Specific Directories

Block Query Parameters

Different Rules for Different Crawlers

Sitemap References

Testing Your Robots.txt

Common Robots.txt Mistakes

Robots.txt Generator

Go Deeper: Related Research

OnlineTools4Free Team

Related Articles

How to Create an XML Sitemap

SEO Meta Tags: Complete Guide to Title, Description & OG Tags

Schema Markup for SEO: Complete JSON-LD Guide

More to Explore

Related Tools

More Articles

Robots.txt Complete Guide: Control How Search Engines Crawl Your Site

What is Robots.txt?

Basic Syntax

Key Directives

Common Robots.txt Patterns

Block All Crawlers

Allow Everything

Block Specific Directories

Block Query Parameters

Different Rules for Different Crawlers

Sitemap References

Testing Your Robots.txt

Common Robots.txt Mistakes

Robots.txt Generator

Go Deeper: Related Research

OnlineTools4Free Team

Related Articles

How to Create an XML Sitemap

SEO Meta Tags: Complete Guide to Title, Description & OG Tags

Schema Markup for SEO: Complete JSON-LD Guide

More to Explore

Related Tools

More Articles