Robots.txt Generator Documentation

Overview: Generate custom robots.txt files to control search engine crawling behavior. Includes templates for WordPress, e-commerce, and various website types.

Features

  • Pre-built templates for common platforms
  • Custom rule creation
  • Syntax validation and error checking
  • Sitemap URL integration
  • Bot-specific directives
  • Comments and documentation

Robots.txt Directives

Directive Description Example
User-agent Specifies which bot the rules apply to User-agent: *
Disallow Prevents crawling of specified paths Disallow: /admin/
Allow Explicitly allows crawling (overrides Disallow) Allow: /public/
Sitemap Points to XML sitemap location Sitemap: https://example.com/sitemap.xml
Crawl-delay Sets delay between requests (in seconds) Crawl-delay: 10

Common Templates

WordPress
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://yoursite.com/sitemap.xml
E-commerce
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Sitemap: https://yourstore.com/sitemap.xml

Best Practices

Do:
  • Block admin and private areas
  • Include sitemap URL
  • Use specific bot directives when needed
  • Test your robots.txt file
  • Keep it simple and clear
Don't:
  • Block important SEO pages
  • Use robots.txt for security
  • Block CSS/JS files
  • Forget to update after site changes
  • Use wildcard patterns incorrectly

Testing Your Robots.txt

Important Testing Steps
  1. Use Google Search Console's robots.txt Tester
  2. Test specific URLs and user agents
  3. Check for syntax errors
  4. Verify sitemap accessibility
  5. Monitor for crawl errors after deployment
Common Bots
  • Googlebot: Google's web crawler
  • Bingbot: Bing's search crawler
  • Slurp: Yahoo's web crawler
  • DuckDuckBot: DuckDuckGo crawler
  • FacebookExternalHit: Facebook's bot
Common Mistakes
  • Blocking all crawlers
  • Incorrect file placement
  • Syntax errors
  • Blocking important pages