Robots.txt Generator Documentation
Overview: Generate custom robots.txt files to control search engine crawling behavior. Includes templates for WordPress, e-commerce, and various website types.
Features
- Pre-built templates for common platforms
- Custom rule creation
- Syntax validation and error checking
- Sitemap URL integration
- Bot-specific directives
- Comments and documentation
Robots.txt Directives
Directive | Description | Example |
---|---|---|
User-agent | Specifies which bot the rules apply to | User-agent: * |
Disallow | Prevents crawling of specified paths | Disallow: /admin/ |
Allow | Explicitly allows crawling (overrides Disallow) | Allow: /public/ |
Sitemap | Points to XML sitemap location | Sitemap: https://example.com/sitemap.xml |
Crawl-delay | Sets delay between requests (in seconds) | Crawl-delay: 10 |
Common Templates
WordPress
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://yoursite.com/sitemap.xml
E-commerce
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Sitemap: https://yourstore.com/sitemap.xml
Best Practices
Do:
- Block admin and private areas
- Include sitemap URL
- Use specific bot directives when needed
- Test your robots.txt file
- Keep it simple and clear
Don't:
- Block important SEO pages
- Use robots.txt for security
- Block CSS/JS files
- Forget to update after site changes
- Use wildcard patterns incorrectly
Testing Your Robots.txt
Important Testing Steps
- Use Google Search Console's robots.txt Tester
- Test specific URLs and user agents
- Check for syntax errors
- Verify sitemap accessibility
- Monitor for crawl errors after deployment
Common Bots
- Googlebot: Google's web crawler
- Bingbot: Bing's search crawler
- Slurp: Yahoo's web crawler
- DuckDuckBot: DuckDuckGo crawler
- FacebookExternalHit: Facebook's bot
Common Mistakes
- Blocking all crawlers
- Incorrect file placement
- Syntax errors
- Blocking important pages