


We use cookies to improve your experience
We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience.
Definition
Robots.txt is a text file at the root of a website that instructs search engine crawlers which pages or sections they are allowed or disallowed from crawling. It follows the Robots Exclusion Protocol and is checked by crawlers before accessing any page on the site.
The robots.txt file sits at the root of a domain (e.g., example.com/robots.txt) and contains directives for web crawlers. The basic syntax uses User-agent (which crawler the rules apply to) and Disallow (which paths to block). For example: User-agent: * / Disallow: /admin/ blocks all crawlers from the /admin/ directory.
Important caveats: robots.txt is advisory, not enforcement. Well-behaved crawlers (Googlebot, Bingbot) respect it, but malicious bots may ignore it. Robots.txt does not prevent pages from being indexed — if other sites link to a disallowed page, Google may still index the URL (without the content). For true de-indexing, use the noindex meta tag or X-Robots-Tag header.
Common robots.txt uses include: blocking admin panels, API endpoints, search result pages (to avoid duplicate content), staging environments, and private sections. The file should also reference the sitemap location: Sitemap: https://example.com/sitemap.xml. Overly restrictive robots.txt rules are a common SEO mistake — blocking CSS and JavaScript files can prevent Google from rendering pages properly.