Rankings

Best robots.txt for AI Search 2026

Last updated: April 2026 | Based on testing across 1,400+ UK business websites

Your robots.txt file is the gatekeeper of your website. It tells crawlers what they can and cannot read. Most businesses have never checked theirs. Many are accidentally blocking AI crawlers without knowing it.

This is the single fastest fix for AI visibility. If GPTBot, PerplexityBot or ClaudeBot are blocked, those platforms cannot see your website at all. No indexing, no citations, no visibility. A two-minute edit to one text file can change that.

AI Crawlers You Need to Allow

Crawler	Platform	What it does	Priority	Default status
GPTBot	ChatGPT (OpenAI)	Reads your site for ChatGPT answers	Critical	Blocked by many WordPress security plugins
ChatGPT-User	ChatGPT (browsing mode)	Reads pages when users ask ChatGPT to browse	High	Usually allowed
Google-Extended	Gemini (Google)	Reads your site for Gemini training	High	Usually allowed
GoogleOther	Google AI Overviews	General Google AI crawling	High	Usually allowed
PerplexityBot	Perplexity	Reads your site for real-time answers and citations	Critical	Sometimes blocked
ClaudeBot	Claude (Anthropic)	Reads your site for Claude answers	High	Usually allowed
Claude-Web	Claude (web browsing)	Reads pages when Claude browses	Medium	Usually allowed
Bingbot	Copilot (Microsoft)	Core Bing crawler, feeds Copilot	Critical	Usually allowed
Applebot-Extended	Apple Intelligence / Siri	Reads for Apple AI features	Medium	Usually allowed
CCBot	Common Crawl	Open dataset used by many AI models	Medium	Sometimes blocked

Key insight: 34% of the UK business websites we audited block at least one major AI crawler. The most commonly blocked is GPTBot, typically by WordPress security plugins like Wordfence or Sucuri that add blanket bot-blocking rules. Check yours now: add /robots.txt to your domain and read it.

Recommended robots.txt Configuration

Copy the configuration below into your robots.txt file. Replace yourdomain.com with your actual domain. This explicitly allows every major AI crawler to read your entire site.

Recommended robots.txt for AI visibility

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: Bytespider
Allow: /

User-agent: *
Allow: /

Sitemap: https://www.yourdomain.com/sitemap-index.xml

Common Mistakes

Mistake	Impact	How to fix
WordPress security plugin blocking all bots	Blocks all AI platforms	Check Wordfence, Sucuri, iThemes settings. Whitelist AI crawlers.
Blanket "Disallow: /" for unknown bots	Blocks new AI crawlers	Use "Allow: /" as default, only block specific bad bots
No robots.txt file at all	Crawlers assume allow, but no sitemap reference	Create one with sitemap link
Blocking /api/ or /admin/ paths too broadly	May block legitimate content paths	Be specific with blocked paths
Using Cloudflare bot protection aggressively	Can block AI crawlers via challenge pages	Whitelist known AI bot IPs
Never checking after initial setup	Settings change with plugin updates	Check quarterly

AI Crawler Reference Cards

GPTBot

GPTBot is OpenAI's primary crawler. It reads your website content so that ChatGPT can reference and cite your pages in its answers. When a user asks ChatGPT a question related to your business, GPTBot's index determines whether your site appears in the response.

If blocked: ChatGPT cannot read your website. Your business will not appear in any ChatGPT answer, even if your content is the best match. This is the most impactful crawler to block because ChatGPT has the largest user base of any AI search platform.

How to allow:

User-agent: GPTBot
Allow: /

ChatGPT-User

ChatGPT-User is the browsing-mode crawler. When a ChatGPT user clicks "Browse" or ChatGPT decides it needs live web data, this crawler fetches the page in real time. It is different from GPTBot, which crawls in advance.

If blocked: ChatGPT cannot browse your website live. Even if GPTBot has previously indexed your site, real-time browsing requests will fail. Users who ask ChatGPT to check your site directly will get an error.

How to allow:

User-agent: ChatGPT-User
Allow: /

Google-Extended

Google-Extended is the crawler Google uses specifically for Gemini training data. It is separate from Googlebot (which handles search indexing). Allowing Google-Extended means your content can be used to improve Gemini's understanding of your industry and business.

If blocked: Your content will not be included in Gemini's training data. This reduces the chance that Gemini references your business, products, or expertise when answering related queries.

How to allow:

User-agent: Google-Extended
Allow: /

GoogleOther

GoogleOther handles general AI crawling for Google, including AI Overviews (the AI-generated summaries that appear at the top of Google search results). This crawler feeds the system that decides whether your content appears in those featured AI panels.

If blocked: Your pages may not appear in AI Overviews on Google Search. Given that AI Overviews now appear on a significant percentage of UK search queries, this directly impacts your organic visibility.

How to allow:

User-agent: GoogleOther
Allow: /

PerplexityBot

PerplexityBot powers Perplexity, the AI search engine that provides real-time answers with inline citations. Perplexity is growing rapidly among professionals and researchers. It links directly to source pages, making it one of the best AI platforms for driving referral traffic.

If blocked: Perplexity cannot cite your website. You lose one of the few AI platforms that actively sends traffic back to your site via clickable source links. This is a significant missed opportunity for lead generation.

How to allow:

User-agent: PerplexityBot
Allow: /

ClaudeBot

ClaudeBot is Anthropic's crawler for Claude. It reads your website so that Claude can reference your content when answering user queries. Claude is used extensively in professional and enterprise settings, making it a valuable channel for B2B businesses.

If blocked: Claude cannot access your website content. Your business will not be referenced in Claude's answers. For B2B companies, this matters because Claude's user base skews toward decision-makers and technical professionals.

How to allow:

User-agent: ClaudeBot
Allow: /

Claude-Web

Claude-Web is the browsing-mode crawler for Claude. Similar to ChatGPT-User, it fetches pages in real time when a user asks Claude to look at a specific URL or when Claude needs fresh data to answer a question.

If blocked: Claude cannot browse your site live. Even if ClaudeBot has previously indexed your content, real-time fetch requests will be denied. Users who paste your URL into Claude will get nothing back.

How to allow:

User-agent: Claude-Web
Allow: /

Bingbot

Bingbot is Microsoft's core search crawler. It feeds both Bing search results and Microsoft Copilot. Since Copilot is integrated into Windows, Edge, and Microsoft 365, blocking Bingbot has a cascading effect across the entire Microsoft ecosystem.

If blocked: Your site disappears from Bing search and Microsoft Copilot simultaneously. This is a double loss. Most businesses already allow Bingbot, but it is worth verifying.

How to allow:

User-agent: Bingbot
Allow: /

Applebot-Extended

Applebot-Extended is Apple's AI crawler, feeding Apple Intelligence and Siri. As Apple rolls out more AI features across iPhone, iPad, and Mac, this crawler becomes increasingly important. Apple's installed base in the UK is substantial, particularly among higher-income demographics.

If blocked: Your content will not appear in Apple Intelligence summaries or Siri AI answers. As Apple expands its AI features throughout 2026, the cost of blocking this crawler will grow.

How to allow:

User-agent: Applebot-Extended
Allow: /

CCBot

CCBot is the crawler for Common Crawl, a nonprofit that maintains the largest open web dataset. Many AI models, including smaller and open-source ones, train on Common Crawl data. Allowing CCBot ensures your content reaches a broad range of AI systems beyond the major platforms. For reference, you can see how we handle this in our own robots.txt at rank4ai.co.uk/robots.txt.

If blocked: Your content is excluded from the Common Crawl dataset. This means dozens of AI models that rely on Common Crawl will never see your site. The impact is diffuse but cumulative over time.

How to allow:

User-agent: CCBot
Allow: /

Related Research

Frequently Asked Questions

Will allowing AI crawlers slow my site down?

No. AI crawlers are well-behaved and respect crawl rate limits. They visit infrequently compared to search engine crawlers. You will not notice any performance difference.

Can I allow some AI crawlers but block others?

Yes. robots.txt lets you set rules per user-agent. You can allow GPTBot but block CCBot if you prefer. Each crawler respects only its own user-agent directive.

How do I check my current robots.txt?

Visit yourdomain.com/robots.txt in your browser. It is a plain text file. If it does not exist, you do not have one.

Do I need to restart my server after changing robots.txt?

No. Crawlers read robots.txt fresh on each visit. Changes take effect immediately. There is no caching or restart required on your end.

Does robots.txt affect Google ranking?

Not directly. But blocking Googlebot affects your Google ranking, and blocking Google-Extended affects your Gemini and AI Overview visibility. The two systems are separate but related.

Best robots.txt for AI Search 2026

AI Crawlers You Need to Allow

Recommended robots.txt Configuration

Common Mistakes

AI Crawler Reference Cards

GPTBot

ChatGPT-User

Google-Extended

GoogleOther

PerplexityBot

ClaudeBot

Claude-Web

Bingbot

Applebot-Extended

CCBot

Related Research

Frequently Asked Questions

Will allowing AI crawlers slow my site down?

Can I allow some AI crawlers but block others?

How do I check my current robots.txt?

Do I need to restart my server after changing robots.txt?

Does robots.txt affect Google ranking?

Legal and Registration

Standards and Governance

Domain Continuity