Free Tool

robots.txt AI Bot Checker

Paste your robots.txt below and this free tool instantly shows which AI crawlers you allow or block. Crucially, it separates training bots (GPTBot, ClaudeBot, CCBot) from the search and citation bots that get you seen in AI answers today (OAI-SearchBot, PerplexityBot, Bingbot), because blocking the wrong one quietly removes you from AI search.

The most common mistake we find in audits: a blanket block on "AI bots" that also blocks OAI-SearchBot or Googlebot, which means no ChatGPT search citations and no AI Overviews. The second most common: thinking a GPTBot block removes you from ChatGPT. It does not, it only opts you out of model training.

Paste your robots.txt

Find yours at https://yourdomain.com/robots.txt (open it in a browser, select all, copy, paste here). Browsers block this page from fetching other sites' files directly, so paste is the reliable route. If that URL returns nothing, you have no robots.txt and every crawler is allowed by default.

The AI crawler list this tool checks against

Every user-agent below comes from the operator's own published documentation, linked in the table. List verified 11 June 2026. AI engines add and rename bots often, so treat any undated list elsewhere with suspicion.

The one-line version: training bots feed future models, blocking them is a policy choice with no visibility cost today. Search and citation bots are how AI assistants find and cite you right now, blocking them is an AI-visibility kill switch.

User-agent	Operator	Type	If you block it
GPTBot	OpenAI	Training	Your content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results.
OAI-SearchBot	OpenAI	Search / citation	Your site stops appearing and being cited in ChatGPT search answers.
ChatGPT-User	OpenAI	User-triggered	ChatGPT cannot open your pages on a user’s behalf.
Googlebot	Google	Search / citation	You disappear from Google Search AND from AI Overviews. Almost never what you want.
Google-Extended	Google	Training	Opts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot).
ClaudeBot	Anthropic	Training	Excluded from Claude model training.
Claude-SearchBot	Anthropic	Search / citation	Less likely to be surfaced and cited in Claude’s web search answers.
Claude-User	Anthropic	User-triggered	Claude cannot open your pages on a user’s behalf.
PerplexityBot	Perplexity	Search / citation	Your site stops being surfaced and cited in Perplexity answers.
Perplexity-User	Perplexity	User-triggered	Perplexity cannot open your pages on a user’s behalf.
Bingbot	Microsoft	Search / citation	You disappear from Bing and weaken your presence in Copilot.
CCBot	Common Crawl	Training	Excluded from future Common Crawl snapshots, which many model builders train on.
Applebot	Apple	Search / citation	Removed from Siri and Spotlight results.
Applebot-Extended	Apple	Training	Excluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot).
Meta-ExternalAgent	Meta	Training	Excluded from Meta AI model training.
Amazonbot	Amazon	Search / citation	Weaker presence in Alexa answers.
DuckAssistBot	DuckDuckGo	Search / citation	Excluded from DuckAssist answers.
Bytespider	ByteDance	Training	Excluded from ByteDance training, where the bot respects robots.txt.

AI crawler user-agents and what blocking each one means (verified 11 June 2026)

User-agent	Operator	Type	If you block it
GPTBot	OpenAI	Training	Your content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results.
OAI-SearchBot	OpenAI	Search / citation	Your site stops appearing and being cited in ChatGPT search answers.
ChatGPT-User	OpenAI	User-triggered	ChatGPT cannot open your pages on a user’s behalf.
Googlebot	Google	Search / citation	You disappear from Google Search AND from AI Overviews. Almost never what you want.
Google-Extended	Google	Training	Opts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot).
ClaudeBot	Anthropic	Training	Excluded from Claude model training.
Claude-SearchBot	Anthropic	Search / citation	Less likely to be surfaced and cited in Claude’s web search answers.
Claude-User	Anthropic	User-triggered	Claude cannot open your pages on a user’s behalf.
PerplexityBot	Perplexity	Search / citation	Your site stops being surfaced and cited in Perplexity answers.
Perplexity-User	Perplexity	User-triggered	Perplexity cannot open your pages on a user’s behalf.
Bingbot	Microsoft	Search / citation	You disappear from Bing and weaken your presence in Copilot.
CCBot	Common Crawl	Training	Excluded from future Common Crawl snapshots, which many model builders train on.
Applebot	Apple	Search / citation	Removed from Siri and Spotlight results.
Applebot-Extended	Apple	Training	Excluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot).
Meta-ExternalAgent	Meta	Training	Excluded from Meta AI model training.
Amazonbot	Amazon	Search / citation	Weaker presence in Alexa answers.
DuckAssistBot	DuckDuckGo	Search / citation	Excluded from DuckAssist answers.
Bytespider	ByteDance	Training	Excluded from ByteDance training, where the bot respects robots.txt.

View as plain-text Markdown

### AI crawler user-agents and what blocking each one means (verified 11 June 2026)

| User-agent | Operator | Type | If you block it |
| --- | --- | --- | --- |
| GPTBot | OpenAI | Training | Your content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results. |
| OAI-SearchBot | OpenAI | Search / citation | Your site stops appearing and being cited in ChatGPT search answers. |
| ChatGPT-User | OpenAI | User-triggered | ChatGPT cannot open your pages on a user’s behalf. |
| Googlebot | Google | Search / citation | You disappear from Google Search AND from AI Overviews. Almost never what you want. |
| Google-Extended | Google | Training | Opts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot). |
| ClaudeBot | Anthropic | Training | Excluded from Claude model training. |
| Claude-SearchBot | Anthropic | Search / citation | Less likely to be surfaced and cited in Claude’s web search answers. |
| Claude-User | Anthropic | User-triggered | Claude cannot open your pages on a user’s behalf. |
| PerplexityBot | Perplexity | Search / citation | Your site stops being surfaced and cited in Perplexity answers. |
| Perplexity-User | Perplexity | User-triggered | Perplexity cannot open your pages on a user’s behalf. |
| Bingbot | Microsoft | Search / citation | You disappear from Bing and weaken your presence in Copilot. |
| CCBot | Common Crawl | Training | Excluded from future Common Crawl snapshots, which many model builders train on. |
| Applebot | Apple | Search / citation | Removed from Siri and Spotlight results. |
| Applebot-Extended | Apple | Training | Excluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot). |
| Meta-ExternalAgent | Meta | Training | Excluded from Meta AI model training. |
| Amazonbot | Amazon | Search / citation | Weaker presence in Alexa answers. |
| DuckAssistBot | DuckDuckGo | Search / citation | Excluded from DuckAssist answers. |
| Bytespider | ByteDance | Training | Excluded from ByteDance training, where the bot respects robots.txt. |

How the checker decides allowed vs blocked

The rules are published so the verdicts are auditable, and they follow the Robots Exclusion Protocol (RFC 9309) the way the major crawlers document it:

Group matching: for each bot, the checker picks the User-agent group with the longest name that matches the bot's token (so a "GPTBot" group beats "*"). If no named group matches, the "*" group applies. If there is no "*" group either, the bot is allowed by default.

Rule evaluation: within the matched group, the rule with the longest matching path wins, and Allow beats Disallow on a tie. Wildcards (*) and end anchors ($) are supported. An empty "Disallow:" means allow everything.

Verdicts: Blocked = the site root (/) is disallowed for that bot. Partly restricted = the root is allowed but some paths are disallowed. Allowed = no disallow applies. Each verdict shows which group produced it so you can check the working.

Honest limits: robots.txt is a request, not a lock. Well-run crawlers (everything in the table) respect it; bad actors ignore it. And this tool checks crawl permission only, not whether your content is worth citing.

Nothing leaves your browser: the file you paste is parsed locally and never sent to us.

A caveat on blocking AI bots

“Blocking training bots is a legitimate policy choice, but it is not a visibility strategy either way. What moves the needle is whether the search and citation bots can reach you and whether your pages give them something worth citing. We have audited sites that blocked every AI bot in a panic and wondered why ChatGPT stopped mentioning them.”

Adam Parker

Founder, Rank4AI

Reviewed 16 June 2026

Common questions

Does blocking GPTBot remove me from ChatGPT?

No. GPTBot only collects training data. ChatGPT search results and citations come via OAI-SearchBot, and live page visits come via ChatGPT-User. You can block training and keep citations, that is exactly what many publishers do.

What is the difference between Googlebot and Google-Extended?

Googlebot crawls for Google Search, and that same crawl feeds AI Overviews and AI Mode, so blocking it removes you from both. Google-Extended is not a crawler at all: it is a control token that opts your content out of Gemini model training and grounding. Blocking Google-Extended does not affect your Google Search or AI Overviews presence.

My robots.txt URL returns a 404. Is that bad?

No, it just means every crawler is allowed everywhere, which is the most common state and fine for most businesses. Add a robots.txt when you want to make deliberate choices, for example opting out of training while keeping search bots in.

Should I block AI training bots?

It depends on what you sell. If your content is the product (publishers, original research), opting out of training is reasonable. If you sell services and want AI assistants to know who you are, openness usually serves you better. There is no single right answer, which is why this tool reports facts rather than telling you what to do.

Adam Parker

Founder, Rank4AI

Adam is the founder of Rank4AI, specialising in AI search visibility. He helps businesses get found across ChatGPT, Gemini, Perplexity, and AI Overviews through technical optimisation and strategic content.

Last reviewed: 11 June 2026

Free Schema Generator llms.txt Validator The Five Signal Model Get a Free Audit →

Crawler access is one signal of five.

Our free audit tests crawler access, schema, entity clarity, llms.txt and trust pages across ChatGPT, Claude, Gemini, Perplexity, Copilot and Google AI.

Get a Free AI Search Visibility Audit →