The team behind OnlineTools4Free — building free, private browser tools.
Published Feb 4, 2026 · 8 min read · Reviewed by OnlineTools4Free
Robots.txt Complete Guide: Control How Search Engines Crawl Your Site
What is Robots.txt?
The robots.txt file is a plain text file at the root of your website that tells search engine crawlers which pages they are allowed to access and which they should skip. It lives at https://yourdomain.com/robots.txt and is one of the first files crawlers check when they visit your site.
The file follows the Robots Exclusion Protocol, a standard from 1994 that has been respected by all major search engines ever since. Google, Bing, Yandex, and others all read and honor robots.txt directives.
Important: robots.txt is a suggestion, not a security measure. Well-behaved crawlers (search engines) follow the rules. Malicious bots ignore them. Never use robots.txt to hide sensitive content — use authentication instead.
Basic Syntax
A robots.txt file consists of one or more rule blocks. Each block specifies a user-agent (the crawler) and a set of Allow or Disallow directives:
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /admin/public/
Sitemap: https://example.com/sitemap.xml
Key Directives
- User-agent: Specifies which crawler the rules apply to.
*means all crawlers.Googlebottargets Google specifically. - Disallow: Blocks the crawler from accessing the specified path.
Disallow: /blocks everything.Disallow:(empty) allows everything. - Allow: Overrides a Disallow for a more specific path. Useful for allowing a subfolder within a blocked directory.
- Sitemap: Tells crawlers where to find your XML sitemap. This is optional but recommended.
- Crawl-delay: Asks the crawler to wait a specified number of seconds between requests. Respected by Bing and Yandex but ignored by Google.
Common Robots.txt Patterns
Block All Crawlers
User-agent: *
Disallow: /
Use this for staging sites and development environments. You never want a staging site appearing in search results.
Allow Everything
User-agent: *
Disallow:
Or simply an empty file. Most sites should allow full crawling — you want search engines to index your content.
Block Specific Directories
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /tmp/
Disallow: /internal/
Block administrative areas, API endpoints, temporary files, and internal tools. These pages add no value in search results and waste your crawl budget.
Block Query Parameters
User-agent: *
Disallow: /*?sort=
Disallow: /*?filter=
Disallow: /*?page=
Prevents crawlers from indexing paginated, sorted, or filtered versions of the same content. These duplicate pages dilute your SEO.
Different Rules for Different Crawlers
User-agent: Googlebot
Disallow: /private/
User-agent: Bingbot
Disallow: /private/
Crawl-delay: 5
User-agent: *
Disallow: /
This allows Google and Bing to crawl most of your site while blocking all other crawlers entirely. The most specific matching user-agent block applies.
Generate these configurations automatically with our Robots.txt Generator.
Sitemap References
Adding a Sitemap directive to your robots.txt is one of the easiest SEO wins:
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-images.xml
You can list multiple sitemaps. The Sitemap directive can appear anywhere in the file — it is not tied to any User-agent block.
Submitting your sitemap through Google Search Console and Bing Webmaster Tools is also recommended, but the robots.txt reference ensures that any well-behaved crawler can discover your sitemap automatically.
Testing Your Robots.txt
A syntax error in robots.txt can block your entire site from search results. Always test before deploying:
- Google Search Console: The "Robots.txt Tester" (under the legacy tools) lets you enter a URL and see whether your robots.txt blocks or allows it. It also highlights syntax errors.
- Manual check: Navigate to
https://yourdomain.com/robots.txtin your browser and verify the content is correct. Ensure the file is served as plain text (Content-Type: text/plain). - Staging test: Deploy robots.txt changes to a staging environment first and test with Google's URL Inspection tool before pushing to production.
Common testing scenarios to verify:
- Is your homepage crawlable?
- Are your product/content pages crawlable?
- Are admin pages blocked?
- Are duplicate content paths (sort, filter, pagination) blocked?
- Is your sitemap discoverable?
Common Robots.txt Mistakes
- Blocking CSS and JavaScript: Google needs to render your page to index it properly. Blocking CSS and JS files causes Google to see a broken page. Never put
Disallow: /css/orDisallow: /js/in your robots.txt. - Leaving staging rules in production: The most catastrophic mistake. A
Disallow: /left in your production robots.txt removes your entire site from search results. Always review robots.txt as part of your deployment checklist. - Wrong file location: Robots.txt must be at the domain root:
/robots.txt. Files at/subfolder/robots.txtor/Robots.txtare not recognized. - Using robots.txt for noindex: Blocking a page via robots.txt prevents crawling but does not remove it from search results if other sites link to it. For true deindexing, use a
noindexmeta tag or HTTP header instead. - Overly broad rules:
Disallow: /pblocks every URL starting with /p — including /products, /pricing, and /press. Be precise with your paths. - Forgetting trailing slashes:
Disallow: /adminblocks /admin, /admin/, and /administrator.Disallow: /admin/only blocks paths under /admin/. The trailing slash matters.
Create your robots.txt with our Robots.txt Generator. For more SEO guidance, see our articles on SEO meta tags, schema markup, and Open Graph images.
Robots.txt Generator
Create robots.txt files with user-agent rules, sitemaps, and crawl directives.
OnlineTools4Free Team
The OnlineTools4Free Team
We are a small team of developers and designers building free, privacy-first browser tools. Every tool on this platform runs entirely in your browser — your files never leave your device.
