robots.txt AI Bot Checker

Paste your robots.txt below and this free tool instantly shows which AI crawlers you allow or block. Crucially, it separates training bots (GPTBot, ClaudeBot, CCBot) from the search and citation bots that get you seen in AI answers today (OAI-SearchBot, PerplexityBot, Bingbot), because blocking the wrong one quietly removes you from AI search.

The most common mistake we find in audits: a blanket block on "AI bots" that also blocks OAI-SearchBot or Googlebot, which means no ChatGPT search citations and no AI Overviews. The second most common: thinking a GPTBot block removes you from ChatGPT. It does not, it only opts you out of model training.

Paste your robots.txt

Find yours at https://yourdomain.com/robots.txt (open it in a browser, select all, copy, paste here). Browsers block this page from fetching other sites' files directly, so paste is the reliable route. If that URL returns nothing, you have no robots.txt and every crawler is allowed by default.

The AI crawler list this tool checks against

Every user-agent below comes from the operator's own published documentation, linked in the table. List verified 11 June 2026. AI engines add and rename bots often, so treat any undated list elsewhere with suspicion.

The one-line version: training bots feed future models, blocking them is a policy choice with no visibility cost today. Search and citation bots are how AI assistants find and cite you right now, blocking them is an AI-visibility kill switch.

User-agent Operator Type If you block it
GPTBot OpenAI Training Your content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results.
OAI-SearchBot OpenAI Search / citation Your site stops appearing and being cited in ChatGPT search answers.
ChatGPT-User OpenAI User-triggered ChatGPT cannot open your pages on a user’s behalf.
Googlebot Google Search / citation You disappear from Google Search AND from AI Overviews. Almost never what you want.
Google-Extended Google Training Opts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot).
ClaudeBot Anthropic Training Excluded from Claude model training.
Claude-SearchBot Anthropic Search / citation Less likely to be surfaced and cited in Claude’s web search answers.
Claude-User Anthropic User-triggered Claude cannot open your pages on a user’s behalf.
PerplexityBot Perplexity Search / citation Your site stops being surfaced and cited in Perplexity answers.
Perplexity-User Perplexity User-triggered Perplexity cannot open your pages on a user’s behalf.
Bingbot Microsoft Search / citation You disappear from Bing and weaken your presence in Copilot.
CCBot Common Crawl Training Excluded from future Common Crawl snapshots, which many model builders train on.
Applebot Apple Search / citation Removed from Siri and Spotlight results.
Applebot-Extended Apple Training Excluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot).
Meta-ExternalAgent Meta Training Excluded from Meta AI model training.
Amazonbot Amazon Search / citation Weaker presence in Alexa answers.
DuckAssistBot DuckDuckGo Search / citation Excluded from DuckAssist answers.
Bytespider ByteDance Training Excluded from ByteDance training, where the bot respects robots.txt.
AI crawler user-agents and what blocking each one means (verified 11 June 2026)
User-agentOperatorTypeIf you block it
GPTBotOpenAITrainingYour content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results.
OAI-SearchBotOpenAISearch / citationYour site stops appearing and being cited in ChatGPT search answers.
ChatGPT-UserOpenAIUser-triggeredChatGPT cannot open your pages on a user’s behalf.
GooglebotGoogleSearch / citationYou disappear from Google Search AND from AI Overviews. Almost never what you want.
Google-ExtendedGoogleTrainingOpts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot).
ClaudeBotAnthropicTrainingExcluded from Claude model training.
Claude-SearchBotAnthropicSearch / citationLess likely to be surfaced and cited in Claude’s web search answers.
Claude-UserAnthropicUser-triggeredClaude cannot open your pages on a user’s behalf.
PerplexityBotPerplexitySearch / citationYour site stops being surfaced and cited in Perplexity answers.
Perplexity-UserPerplexityUser-triggeredPerplexity cannot open your pages on a user’s behalf.
BingbotMicrosoftSearch / citationYou disappear from Bing and weaken your presence in Copilot.
CCBotCommon CrawlTrainingExcluded from future Common Crawl snapshots, which many model builders train on.
ApplebotAppleSearch / citationRemoved from Siri and Spotlight results.
Applebot-ExtendedAppleTrainingExcluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot).
Meta-ExternalAgentMetaTrainingExcluded from Meta AI model training.
AmazonbotAmazonSearch / citationWeaker presence in Alexa answers.
DuckAssistBotDuckDuckGoSearch / citationExcluded from DuckAssist answers.
BytespiderByteDanceTrainingExcluded from ByteDance training, where the bot respects robots.txt.
View as plain-text Markdown
### AI crawler user-agents and what blocking each one means (verified 11 June 2026)

| User-agent | Operator | Type | If you block it |
| --- | --- | --- | --- |
| GPTBot | OpenAI | Training | Your content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results. |
| OAI-SearchBot | OpenAI | Search / citation | Your site stops appearing and being cited in ChatGPT search answers. |
| ChatGPT-User | OpenAI | User-triggered | ChatGPT cannot open your pages on a user’s behalf. |
| Googlebot | Google | Search / citation | You disappear from Google Search AND from AI Overviews. Almost never what you want. |
| Google-Extended | Google | Training | Opts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot). |
| ClaudeBot | Anthropic | Training | Excluded from Claude model training. |
| Claude-SearchBot | Anthropic | Search / citation | Less likely to be surfaced and cited in Claude’s web search answers. |
| Claude-User | Anthropic | User-triggered | Claude cannot open your pages on a user’s behalf. |
| PerplexityBot | Perplexity | Search / citation | Your site stops being surfaced and cited in Perplexity answers. |
| Perplexity-User | Perplexity | User-triggered | Perplexity cannot open your pages on a user’s behalf. |
| Bingbot | Microsoft | Search / citation | You disappear from Bing and weaken your presence in Copilot. |
| CCBot | Common Crawl | Training | Excluded from future Common Crawl snapshots, which many model builders train on. |
| Applebot | Apple | Search / citation | Removed from Siri and Spotlight results. |
| Applebot-Extended | Apple | Training | Excluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot). |
| Meta-ExternalAgent | Meta | Training | Excluded from Meta AI model training. |
| Amazonbot | Amazon | Search / citation | Weaker presence in Alexa answers. |
| DuckAssistBot | DuckDuckGo | Search / citation | Excluded from DuckAssist answers. |
| Bytespider | ByteDance | Training | Excluded from ByteDance training, where the bot respects robots.txt. |

How the checker decides allowed vs blocked

The rules are published so the verdicts are auditable, and they follow the Robots Exclusion Protocol (RFC 9309) the way the major crawlers document it:

Group matching: for each bot, the checker picks the User-agent group with the longest name that matches the bot's token (so a "GPTBot" group beats "*"). If no named group matches, the "*" group applies. If there is no "*" group either, the bot is allowed by default.

Rule evaluation: within the matched group, the rule with the longest matching path wins, and Allow beats Disallow on a tie. Wildcards (*) and end anchors ($) are supported. An empty "Disallow:" means allow everything.

Verdicts: Blocked = the site root (/) is disallowed for that bot. Partly restricted = the root is allowed but some paths are disallowed. Allowed = no disallow applies. Each verdict shows which group produced it so you can check the working.

Honest limits: robots.txt is a request, not a lock. Well-run crawlers (everything in the table) respect it; bad actors ignore it. And this tool checks crawl permission only, not whether your content is worth citing.

Nothing leaves your browser: the file you paste is parsed locally and never sent to us.

A caveat on blocking AI bots
“Blocking training bots is a legitimate policy choice, but it is not a visibility strategy either way. What moves the needle is whether the search and citation bots can reach you and whether your pages give them something worth citing. We have audited sites that blocked every AI bot in a panic and wondered why ChatGPT stopped mentioning them.”
AP

Adam Parker

Founder, Rank4AI

Reviewed 16 June 2026

Common questions

Does blocking GPTBot remove me from ChatGPT?

No. GPTBot only collects training data. ChatGPT search results and citations come via OAI-SearchBot, and live page visits come via ChatGPT-User. You can block training and keep citations, that is exactly what many publishers do.

What is the difference between Googlebot and Google-Extended?

Googlebot crawls for Google Search, and that same crawl feeds AI Overviews and AI Mode, so blocking it removes you from both. Google-Extended is not a crawler at all: it is a control token that opts your content out of Gemini model training and grounding. Blocking Google-Extended does not affect your Google Search or AI Overviews presence.

My robots.txt URL returns a 404. Is that bad?

No, it just means every crawler is allowed everywhere, which is the most common state and fine for most businesses. Add a robots.txt when you want to make deliberate choices, for example opting out of training while keeping search bots in.

Should I block AI training bots?

It depends on what you sell. If your content is the product (publishers, original research), opting out of training is reasonable. If you sell services and want AI assistants to know who you are, openness usually serves you better. There is no single right answer, which is why this tool reports facts rather than telling you what to do.

AP

Adam Parker

Founder, Rank4AI

Adam is the founder of Rank4AI, specialising in AI search visibility. He helps businesses get found across ChatGPT, Gemini, Perplexity, and AI Overviews through technical optimisation and strategic content.

Last reviewed: 11 June 2026

Crawler access is one signal of five.

Our free audit tests crawler access, schema, entity clarity, llms.txt and trust pages across ChatGPT, Claude, Gemini, Perplexity, Copilot and Google AI.

Get a Free AI Search Visibility Audit →