Technical SEO

Robots.txt: What It Actually Does (And Common Mistakes to Avoid)

Manoj ReddySeptember 13, 20253 min read

What Robots.txt Does

The robots.txt file tells search engine crawlers which parts of your site they are allowed to access. It lives at yoursite.com/robots.txt and is one of the first things crawlers check before exploring your site.

Critical misunderstanding: robots.txt controls crawling, not indexing. Blocking a URL in robots.txt prevents Google from crawling it, but if other pages link to that URL, Google may still index the URL based on external information. To prevent indexing, you need a noindex tag instead.

Basic Robots.txt Syntax

The file uses simple directives:

User-agent: Specifies which crawler the rules apply to. Use * for all crawlers.
Disallow: Blocks a specific path from being crawled.
Allow: Permits crawling of a specific path (overrides a broader Disallow).
Sitemap: Tells crawlers where to find your XML sitemap.

Common Mistakes That Kill SEO

Blocking Your Entire Site

A blank Disallow directive allows everything. A "/" Disallow blocks everything. The difference is one character, and getting it wrong makes your entire site invisible to Google.

After every robots.txt change, test it using Google's robots.txt Tester in Search Console.

Blocking CSS and JavaScript Files

If Google cannot access your CSS and JS files, it cannot render your pages properly. This hurts rankings because Google evaluates the rendered page, not just the raw HTML.

Check that your robots.txt does not block /wp-includes/, /assets/, or wherever your CSS and JS files live.

Blocking Important Directories

Sites sometimes block directories like /blog/, /products/, or /category/ without realizing the SEO impact. Audit your Disallow rules and verify that every blocked path is intentionally blocked.

Leftover Development Restrictions

During development, sites often include a blanket Disallow: / to prevent indexing. If this is not removed before launch, the production site remains invisible to search engines. This happens more often than anyone admits.

Different Robots.txt for Staging and Production

Ensure your staging site has restrictive robots.txt (blocking everything) and your production site has the correct, permissive version. Mixing these up is a common deployment mistake.

What to Block

Legitimate uses of Disallow:

Admin pages: /admin/, /wp-admin/
User-specific pages: /account/, /cart/, /checkout/
Internal search results: /search/
Filtered/faceted URLs that create infinite crawl paths
Thank-you or confirmation pages
API endpoints that should not be crawled

What Not to Block

Never block:

CSS and JavaScript files
Image directories
Any page you want indexed
Your sitemap files

Robots.txt vs Noindex

Use robots.txt to manage crawl budget — preventing Google from wasting time on low-value URLs. Use noindex to prevent specific pages from appearing in search results.

You should not combine both on the same URL. If you block a page in robots.txt, Google cannot see the noindex tag because it never crawls the page.

Testing and Monitoring

Test changes using Search Console's robots.txt Tester before deploying
Monitor crawl stats in Search Console after any changes
Set up alerts for unauthorized robots.txt modifications
Review your robots.txt quarterly as your site structure evolves

A well-configured robots.txt is invisible when it works. A misconfigured one can quietly destroy your organic traffic.