Free Tool
robots.txt AI Bot Checker
Paste your robots.txt below and this free tool instantly shows which AI crawlers you allow or block. Crucially, it separates training bots (GPTBot, ClaudeBot, CCBot) from the search and citation bots that get you seen in AI answers today (OAI-SearchBot, PerplexityBot, Bingbot), because blocking the wrong one quietly removes you from AI search.
The most common mistake we find in audits: a blanket block on "AI bots" that also blocks OAI-SearchBot or Googlebot, which means no ChatGPT search citations and no AI Overviews. The second most common: thinking a GPTBot block removes you from ChatGPT. It does not, it only opts you out of model training.
Paste your robots.txt
Find yours at https://yourdomain.com/robots.txt (open it in a browser, select all, copy, paste here). Browsers block this page from fetching other sites' files directly, so paste is the reliable route. If that URL returns nothing, you have no robots.txt and every crawler is allowed by default.
The AI crawler list this tool checks against
Every user-agent below comes from the operator's own published documentation, linked in the table. List verified 11 June 2026. AI engines add and rename bots often, so treat any undated list elsewhere with suspicion.
The one-line version: training bots feed future models, blocking them is a policy choice with no visibility cost today. Search and citation bots are how AI assistants find and cite you right now, blocking them is an AI-visibility kill switch.
| User-agent | Operator | Type | If you block it |
|---|---|---|---|
| GPTBot | OpenAI | Training | Your content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results. |
| OAI-SearchBot | OpenAI | Search / citation | Your site stops appearing and being cited in ChatGPT search answers. |
| ChatGPT-User | OpenAI | User-triggered | ChatGPT cannot open your pages on a user’s behalf. |
| Googlebot | Search / citation | You disappear from Google Search AND from AI Overviews. Almost never what you want. | |
| Google-Extended | Training | Opts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot). | |
| ClaudeBot | Anthropic | Training | Excluded from Claude model training. |
| Claude-SearchBot | Anthropic | Search / citation | Less likely to be surfaced and cited in Claude’s web search answers. |
| Claude-User | Anthropic | User-triggered | Claude cannot open your pages on a user’s behalf. |
| PerplexityBot | Perplexity | Search / citation | Your site stops being surfaced and cited in Perplexity answers. |
| Perplexity-User | Perplexity | User-triggered | Perplexity cannot open your pages on a user’s behalf. |
| Bingbot | Microsoft | Search / citation | You disappear from Bing and weaken your presence in Copilot. |
| CCBot | Common Crawl | Training | Excluded from future Common Crawl snapshots, which many model builders train on. |
| Applebot | Apple | Search / citation | Removed from Siri and Spotlight results. |
| Applebot-Extended | Apple | Training | Excluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot). |
| Meta-ExternalAgent | Meta | Training | Excluded from Meta AI model training. |
| Amazonbot | Amazon | Search / citation | Weaker presence in Alexa answers. |
| DuckAssistBot | DuckDuckGo | Search / citation | Excluded from DuckAssist answers. |
| Bytespider | ByteDance | Training | Excluded from ByteDance training, where the bot respects robots.txt. |
| User-agent | Operator | Type | If you block it |
|---|---|---|---|
| GPTBot | OpenAI | Training | Your content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results. |
| OAI-SearchBot | OpenAI | Search / citation | Your site stops appearing and being cited in ChatGPT search answers. |
| ChatGPT-User | OpenAI | User-triggered | ChatGPT cannot open your pages on a user’s behalf. |
| Googlebot | Search / citation | You disappear from Google Search AND from AI Overviews. Almost never what you want. | |
| Google-Extended | Training | Opts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot). | |
| ClaudeBot | Anthropic | Training | Excluded from Claude model training. |
| Claude-SearchBot | Anthropic | Search / citation | Less likely to be surfaced and cited in Claude’s web search answers. |
| Claude-User | Anthropic | User-triggered | Claude cannot open your pages on a user’s behalf. |
| PerplexityBot | Perplexity | Search / citation | Your site stops being surfaced and cited in Perplexity answers. |
| Perplexity-User | Perplexity | User-triggered | Perplexity cannot open your pages on a user’s behalf. |
| Bingbot | Microsoft | Search / citation | You disappear from Bing and weaken your presence in Copilot. |
| CCBot | Common Crawl | Training | Excluded from future Common Crawl snapshots, which many model builders train on. |
| Applebot | Apple | Search / citation | Removed from Siri and Spotlight results. |
| Applebot-Extended | Apple | Training | Excluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot). |
| Meta-ExternalAgent | Meta | Training | Excluded from Meta AI model training. |
| Amazonbot | Amazon | Search / citation | Weaker presence in Alexa answers. |
| DuckAssistBot | DuckDuckGo | Search / citation | Excluded from DuckAssist answers. |
| Bytespider | ByteDance | Training | Excluded from ByteDance training, where the bot respects robots.txt. |
View as plain-text Markdown
### AI crawler user-agents and what blocking each one means (verified 11 June 2026) | User-agent | Operator | Type | If you block it | | --- | --- | --- | --- | | GPTBot | OpenAI | Training | Your content is excluded from future OpenAI model training. Does NOT remove you from ChatGPT search results. | | OAI-SearchBot | OpenAI | Search / citation | Your site stops appearing and being cited in ChatGPT search answers. | | ChatGPT-User | OpenAI | User-triggered | ChatGPT cannot open your pages on a user’s behalf. | | Googlebot | Google | Search / citation | You disappear from Google Search AND from AI Overviews. Almost never what you want. | | Google-Extended | Google | Training | Opts your content out of Gemini training. You stay in Google Search and AI Overviews (those use Googlebot). | | ClaudeBot | Anthropic | Training | Excluded from Claude model training. | | Claude-SearchBot | Anthropic | Search / citation | Less likely to be surfaced and cited in Claude’s web search answers. | | Claude-User | Anthropic | User-triggered | Claude cannot open your pages on a user’s behalf. | | PerplexityBot | Perplexity | Search / citation | Your site stops being surfaced and cited in Perplexity answers. | | Perplexity-User | Perplexity | User-triggered | Perplexity cannot open your pages on a user’s behalf. | | Bingbot | Microsoft | Search / citation | You disappear from Bing and weaken your presence in Copilot. | | CCBot | Common Crawl | Training | Excluded from future Common Crawl snapshots, which many model builders train on. | | Applebot | Apple | Search / citation | Removed from Siri and Spotlight results. | | Applebot-Extended | Apple | Training | Excluded from Apple Intelligence model training. You stay in Siri and Spotlight (those use Applebot). | | Meta-ExternalAgent | Meta | Training | Excluded from Meta AI model training. | | Amazonbot | Amazon | Search / citation | Weaker presence in Alexa answers. | | DuckAssistBot | DuckDuckGo | Search / citation | Excluded from DuckAssist answers. | | Bytespider | ByteDance | Training | Excluded from ByteDance training, where the bot respects robots.txt. |
How the checker decides allowed vs blocked
The rules are published so the verdicts are auditable, and they follow the Robots Exclusion Protocol (RFC 9309) the way the major crawlers document it:
Group matching: for each bot, the checker picks the User-agent group with the longest name that matches the bot's token (so a "GPTBot" group beats "*"). If no named group matches, the "*" group applies. If there is no "*" group either, the bot is allowed by default.
Rule evaluation: within the matched group, the rule with the longest matching path wins, and Allow beats Disallow on a tie. Wildcards (*) and end anchors ($) are supported. An empty "Disallow:" means allow everything.
Verdicts: Blocked = the site root (/) is disallowed for that bot. Partly restricted = the root is allowed but some paths are disallowed. Allowed = no disallow applies. Each verdict shows which group produced it so you can check the working.
Honest limits: robots.txt is a request, not a lock. Well-run crawlers (everything in the table) respect it; bad actors ignore it. And this tool checks crawl permission only, not whether your content is worth citing.
Nothing leaves your browser: the file you paste is parsed locally and never sent to us.
“Blocking training bots is a legitimate policy choice, but it is not a visibility strategy either way. What moves the needle is whether the search and citation bots can reach you and whether your pages give them something worth citing. We have audited sites that blocked every AI bot in a panic and wondered why ChatGPT stopped mentioning them.”
Common questions
Does blocking GPTBot remove me from ChatGPT?
No. GPTBot only collects training data. ChatGPT search results and citations come via OAI-SearchBot, and live page visits come via ChatGPT-User. You can block training and keep citations, that is exactly what many publishers do.
What is the difference between Googlebot and Google-Extended?
Googlebot crawls for Google Search, and that same crawl feeds AI Overviews and AI Mode, so blocking it removes you from both. Google-Extended is not a crawler at all: it is a control token that opts your content out of Gemini model training and grounding. Blocking Google-Extended does not affect your Google Search or AI Overviews presence.
My robots.txt URL returns a 404. Is that bad?
No, it just means every crawler is allowed everywhere, which is the most common state and fine for most businesses. Add a robots.txt when you want to make deliberate choices, for example opting out of training while keeping search bots in.
Should I block AI training bots?
It depends on what you sell. If your content is the product (publishers, original research), opting out of training is reasonable. If you sell services and want AI assistants to know who you are, openness usually serves you better. There is no single right answer, which is why this tool reports facts rather than telling you what to do.
Adam Parker
Founder, Rank4AI
Adam is the founder of Rank4AI, specialising in AI search visibility. He helps businesses get found across ChatGPT, Gemini, Perplexity, and AI Overviews through technical optimisation and strategic content.
Last reviewed: 11 June 2026
Crawler access is one signal of five.
Our free audit tests crawler access, schema, entity clarity, llms.txt and trust pages across ChatGPT, Claude, Gemini, Perplexity, Copilot and Google AI.
Get a Free AI Search Visibility Audit →