Best robots.txt for AI Search 2026

Last updated: April 2026 | Based on testing across 1,400+ UK business websites

Your robots.txt file is the gatekeeper of your website. It tells crawlers what they can and cannot read. Most businesses have never checked theirs. Many are accidentally blocking AI crawlers without knowing it.

This is the single fastest fix for AI visibility. If GPTBot, PerplexityBot or ClaudeBot are blocked, those platforms cannot see your website at all. No indexing, no citations, no visibility. A two-minute edit to one text file can change that.

AI Crawlers You Need to Allow

Crawler Platform What it does Priority Default status
GPTBot ChatGPT (OpenAI) Reads your site for ChatGPT answers Critical Blocked by many WordPress security plugins
ChatGPT-User ChatGPT (browsing mode) Reads pages when users ask ChatGPT to browse High Usually allowed
Google-Extended Gemini (Google) Reads your site for Gemini training High Usually allowed
GoogleOther Google AI Overviews General Google AI crawling High Usually allowed
PerplexityBot Perplexity Reads your site for real-time answers and citations Critical Sometimes blocked
ClaudeBot Claude (Anthropic) Reads your site for Claude answers High Usually allowed
Claude-Web Claude (web browsing) Reads pages when Claude browses Medium Usually allowed
Bingbot Copilot (Microsoft) Core Bing crawler, feeds Copilot Critical Usually allowed
Applebot-Extended Apple Intelligence / Siri Reads for Apple AI features Medium Usually allowed
CCBot Common Crawl Open dataset used by many AI models Medium Sometimes blocked

Key insight: 34% of the UK business websites we audited block at least one major AI crawler. The most commonly blocked is GPTBot, typically by WordPress security plugins like Wordfence or Sucuri that add blanket bot-blocking rules. Check yours now: add /robots.txt to your domain and read it.

Recommended robots.txt Configuration

Copy the configuration below into your robots.txt file. Replace yourdomain.com with your actual domain. This explicitly allows every major AI crawler to read your entire site.

Recommended robots.txt for AI visibility

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: Bytespider
Allow: /

User-agent: *
Allow: /

Sitemap: https://www.yourdomain.com/sitemap-index.xml

Common Mistakes

Mistake Impact How to fix
WordPress security plugin blocking all bots Blocks all AI platforms Check Wordfence, Sucuri, iThemes settings. Whitelist AI crawlers.
Blanket "Disallow: /" for unknown bots Blocks new AI crawlers Use "Allow: /" as default, only block specific bad bots
No robots.txt file at all Crawlers assume allow, but no sitemap reference Create one with sitemap link
Blocking /api/ or /admin/ paths too broadly May block legitimate content paths Be specific with blocked paths
Using Cloudflare bot protection aggressively Can block AI crawlers via challenge pages Whitelist known AI bot IPs
Never checking after initial setup Settings change with plugin updates Check quarterly

AI Crawler Reference Cards

GPTBot

GPTBot is OpenAI's primary crawler. It reads your website content so that ChatGPT can reference and cite your pages in its answers. When a user asks ChatGPT a question related to your business, GPTBot's index determines whether your site appears in the response.

If blocked: ChatGPT cannot read your website. Your business will not appear in any ChatGPT answer, even if your content is the best match. This is the most impactful crawler to block because ChatGPT has the largest user base of any AI search platform.

How to allow:

User-agent: GPTBot
Allow: /

ChatGPT-User

ChatGPT-User is the browsing-mode crawler. When a ChatGPT user clicks "Browse" or ChatGPT decides it needs live web data, this crawler fetches the page in real time. It is different from GPTBot, which crawls in advance.

If blocked: ChatGPT cannot browse your website live. Even if GPTBot has previously indexed your site, real-time browsing requests will fail. Users who ask ChatGPT to check your site directly will get an error.

How to allow:

User-agent: ChatGPT-User
Allow: /

Google-Extended

Google-Extended is the crawler Google uses specifically for Gemini training data. It is separate from Googlebot (which handles search indexing). Allowing Google-Extended means your content can be used to improve Gemini's understanding of your industry and business.

If blocked: Your content will not be included in Gemini's training data. This reduces the chance that Gemini references your business, products, or expertise when answering related queries.

How to allow:

User-agent: Google-Extended
Allow: /

GoogleOther

GoogleOther handles general AI crawling for Google, including AI Overviews (the AI-generated summaries that appear at the top of Google search results). This crawler feeds the system that decides whether your content appears in those featured AI panels.

If blocked: Your pages may not appear in AI Overviews on Google Search. Given that AI Overviews now appear on a significant percentage of UK search queries, this directly impacts your organic visibility.

How to allow:

User-agent: GoogleOther
Allow: /

PerplexityBot

PerplexityBot powers Perplexity, the AI search engine that provides real-time answers with inline citations. Perplexity is growing rapidly among professionals and researchers. It links directly to source pages, making it one of the best AI platforms for driving referral traffic.

If blocked: Perplexity cannot cite your website. You lose one of the few AI platforms that actively sends traffic back to your site via clickable source links. This is a significant missed opportunity for lead generation.

How to allow:

User-agent: PerplexityBot
Allow: /

ClaudeBot

ClaudeBot is Anthropic's crawler for Claude. It reads your website so that Claude can reference your content when answering user queries. Claude is used extensively in professional and enterprise settings, making it a valuable channel for B2B businesses.

If blocked: Claude cannot access your website content. Your business will not be referenced in Claude's answers. For B2B companies, this matters because Claude's user base skews toward decision-makers and technical professionals.

How to allow:

User-agent: ClaudeBot
Allow: /

Claude-Web

Claude-Web is the browsing-mode crawler for Claude. Similar to ChatGPT-User, it fetches pages in real time when a user asks Claude to look at a specific URL or when Claude needs fresh data to answer a question.

If blocked: Claude cannot browse your site live. Even if ClaudeBot has previously indexed your content, real-time fetch requests will be denied. Users who paste your URL into Claude will get nothing back.

How to allow:

User-agent: Claude-Web
Allow: /

Bingbot

Bingbot is Microsoft's core search crawler. It feeds both Bing search results and Microsoft Copilot. Since Copilot is integrated into Windows, Edge, and Microsoft 365, blocking Bingbot has a cascading effect across the entire Microsoft ecosystem.

If blocked: Your site disappears from Bing search and Microsoft Copilot simultaneously. This is a double loss. Most businesses already allow Bingbot, but it is worth verifying.

How to allow:

User-agent: Bingbot
Allow: /

Applebot-Extended

Applebot-Extended is Apple's AI crawler, feeding Apple Intelligence and Siri. As Apple rolls out more AI features across iPhone, iPad, and Mac, this crawler becomes increasingly important. Apple's installed base in the UK is substantial, particularly among higher-income demographics.

If blocked: Your content will not appear in Apple Intelligence summaries or Siri AI answers. As Apple expands its AI features throughout 2026, the cost of blocking this crawler will grow.

How to allow:

User-agent: Applebot-Extended
Allow: /

CCBot

CCBot is the crawler for Common Crawl, a nonprofit that maintains the largest open web dataset. Many AI models, including smaller and open-source ones, train on Common Crawl data. Allowing CCBot ensures your content reaches a broad range of AI systems beyond the major platforms. For reference, you can see how we handle this in our own robots.txt at rank4ai.co.uk/robots.txt.

If blocked: Your content is excluded from the Common Crawl dataset. This means dozens of AI models that rely on Common Crawl will never see your site. The impact is diffuse but cumulative over time.

How to allow:

User-agent: CCBot
Allow: /

Related Research

Frequently Asked Questions

Will allowing AI crawlers slow my site down?

No. AI crawlers are well-behaved and respect crawl rate limits. They visit infrequently compared to search engine crawlers. You will not notice any performance difference.

Can I allow some AI crawlers but block others?

Yes. robots.txt lets you set rules per user-agent. You can allow GPTBot but block CCBot if you prefer. Each crawler respects only its own user-agent directive.

How do I check my current robots.txt?

Visit yourdomain.com/robots.txt in your browser. It is a plain text file. If it does not exist, you do not have one.

Do I need to restart my server after changing robots.txt?

No. Crawlers read robots.txt fresh on each visit. Changes take effect immediately. There is no caching or restart required on your end.

Does robots.txt affect Google ranking?

Not directly. But blocking Googlebot affects your Google ranking, and blocking Google-Extended affects your Gemini and AI Overview visibility. The two systems are separate but related.