We use cookies to improve your experience. Learn more

    Rank4AI

    The UK's most complete AI search visibility framework

    All Stats
    AI Crawler Access
    23.3%

    23% of UK SME Websites Are Blocking AI Crawlers in Their robots.txt

    We checked the robots.txt files of 30 UK small business websites to see whether they were blocking AI crawlers like GPTBot, ClaudeBot or PerplexityBot. Nearly a quarter were.

    Rank4AI Original Research12 March 2026

    Last updated: March 2026

    We checked the robots.txt files of 30 UK small business websites to see whether they were blocking AI crawlers like GPTBot, ClaudeBot or PerplexityBot. Nearly a quarter were.


    TL;DR

    • 7 out of 30 UK SME websites (23.3%) block at least one AI crawler
    • GPTBot (used by ChatGPT) is the most commonly blocked: 7 sites (23.3%)
    • ClaudeBot (used by Anthropic's Claude) is blocked by 5 sites (16.7%)
    • PerplexityBot is blocked by just 1 site (3.3%)
    • In several cases, the same sites had active SEO signals suggesting they want to be found online — raising the question of whether the blocks are intentional
    • 93% of sites had a robots.txt file in place, but the AI crawler rules within them varied considerably

    What is robots.txt and how does it affect AI visibility?

    robots.txt is a plain text file at the root of a website that tells web crawlers which parts of the site they can and cannot access. It has been a standard part of the web since the mid-1990s, originally designed to manage search engine crawlers like Googlebot and Bingbot.

    In recent years, AI companies have introduced their own crawlers. OpenAI uses GPTBot to gather content for ChatGPT. Anthropic uses ClaudeBot for Claude. Perplexity uses PerplexityBot. These crawlers read website content to help their AI systems understand what businesses do, what services they offer, and what information is on their pages.

    If a website's robots.txt file includes a rule that blocks one of these crawlers, that AI platform cannot access the site's content through crawling. This does not necessarily mean the business will never appear in that platform's responses — AI systems draw from multiple sources beyond direct crawling — but it does remove one significant channel through which the platform can learn about the business.


    What we found

    AI crawler blocking rates

    AI Crawler Sites blocking Percentage
    GPTBot (ChatGPT / OpenAI) 7 / 30 23.3%
    ClaudeBot (Claude / Anthropic) 5 / 30 16.7%
    PerplexityBot (Perplexity) 1 / 30 3.3%
    Any AI crawler 7 / 30 23.3%
    No AI crawlers blocked 23 / 30 76.7%

    GPTBot was the most frequently blocked crawler, appearing in 7 robots.txt files. ClaudeBot was blocked by 5. Only 1 site blocked PerplexityBot. In every case where ClaudeBot was blocked, GPTBot was also blocked — no site blocked Claude alone.

    Which sites are blocking?

    Site Industry Blocks GPTBot Blocks ClaudeBot Blocks PerplexityBot
    Estate Agent A Estate Agent Yes Yes Yes
    Marketing Agency A Marketing Agency Yes Yes No
    Plumber A Plumber Yes No No
    Personal Trainer A Personal Trainer Yes Yes No
    Dentist A Dentist Yes Yes No
    Accountancy Firm A Accountancy Yes Yes No
    Marketing Agency B Marketing Agency Yes No No

    One site — Estate Agent A — blocked all three major AI crawlers. This is the most restrictive configuration in our sample. The estate agent has effectively opted out of AI crawling entirely through its robots.txt.

    The contradiction signal

    What made several of these results notable was the context. Some of the sites blocking AI crawlers also showed clear signals of wanting to be discoverable online:

    • Accountancy Firm A blocks GPTBot and ClaudeBot but has Organisation schema, FAQ schema, sameAs links, and FAQ content — a well-structured site that has invested in discoverability signals while simultaneously blocking the AI platforms it might benefit from
    • Marketing Agency B blocks GPTBot but has an llms.txt file, FAQ schema, and 8 question-style headings — a site that has specifically prepared for AI readability while blocking one of the largest AI crawlers
    • Marketing Agency A blocks GPTBot and ClaudeBot but has Organisation schema and sameAs social links

    This pattern raises the question of whether these blocks are deliberate business decisions or unintended consequences of developer configurations, security plugins, or default settings.


    Why businesses might block AI crawlers

    There are legitimate reasons a business might choose to block AI crawlers:

    Content protection. Some businesses do not want their content used to train AI models or be reproduced in AI-generated responses without attribution. Blocking crawlers is one way to limit this.

    Data sensitivity. Businesses handling sensitive information — legal, medical, financial — may prefer to restrict which systems can access their content.

    Philosophical objection. Some businesses disagree with the premise of AI systems using web content and choose to opt out on principle.

    Bandwidth and performance. AI crawlers can be aggressive in their crawling patterns, and some site owners block them to reduce server load.

    These are all valid choices. Blocking AI crawlers is not inherently wrong or misguided — it is a business decision about how a company's content is used.


    Why some blocks may be unintentional

    However, there are also reasons to believe that not all of the blocks we found are deliberate:

    Security plugins. WordPress security plugins like Wordfence, Sucuri and others sometimes include AI crawler blocks in their default or recommended configurations. A business owner who installed a security plugin may not realise it has added rules blocking GPTBot or ClaudeBot.

    Developer defaults. A web developer building a site may include AI crawler blocks based on their own preferences or standard configuration templates, without discussing the implications with the business owner.

    Copied configurations. robots.txt files are sometimes copied from templates or other sites. A business may have inherited AI crawler blocks from a template without reviewing the specific rules.

    Lack of awareness. Many small business owners do not know what robots.txt does, let alone that specific AI crawlers can be individually blocked or allowed. The decision may never have been consciously made.

    The fact that several sites in our check had AI crawler blocks alongside active SEO and structured data signals suggests that at least some of these blocks may not reflect an intentional business decision to be invisible to AI platforms.


    What this means in practice

    If a business blocks GPTBot, ChatGPT cannot crawl the site directly. This does not mean the business will never appear in ChatGPT responses — the AI system draws from its training data, third-party sources, and other references. But it does mean that ChatGPT cannot access the most current version of the site's content through direct crawling.

    For a business that regularly updates its services, pricing, team, or location information, blocking AI crawlers means those updates may not be reflected in AI-generated responses. The AI platform may rely on older information, third-party descriptions, or no information at all.

    Whether this matters depends on how much a business values being accurately represented in AI-generated answers. For businesses where AI platforms are not a significant discovery channel, the impact may be minimal. For businesses in sectors where consumers increasingly use AI to research and compare options, the impact may grow over time.


    How to check your own robots.txt

    1. Visit yourdomain.co.uk/robots.txt in a browser. The file will display as plain text.
    2. Search for AI crawler names. Look for "GPTBot", "ClaudeBot", "Claude-Web", "PerplexityBot", "ChatGPT-User", "Anthropic", "CCBot" or "Bytespider" (TikTok's crawler)
    3. Check the rules after each User-agent line. If a Disallow: / rule follows an AI crawler's User-agent, that crawler is blocked from the entire site
    4. Check for blanket blocks. Some files use a single User-agent: * with Disallow: / which blocks everything — including AI crawlers

    If you find blocks you did not intentionally set, check whether a security plugin, your developer, or a site template added them. Removing or adjusting these rules is typically straightforward — it is a single text file edit.


    Industry breakdown

    Industry Sites checked Sites blocking AI crawlers Percentage
    Estate Agent 4 1 25%
    Marketing Agency 4 2 50%
    Accountancy 9 1 11%
    Plumber 3 1 33%
    Personal Trainer 3 1 33%
    Dentist 2 1 50%
    Legal 3 0 0%
    Restaurant 1 0 0%
    Retail 1 0 0%

    Marketing agencies and dentists had the highest blocking rate at 50%, though with only 2-4 sites per category these numbers should be read with caution. The marketing agency result is particularly interesting — an industry that advises clients on digital visibility is itself blocking AI crawlers at the highest rate in our sample.

    Law firms, restaurants and retail had zero blocking — though the law firm sites also had zero structured data of any kind, suggesting limited technical configuration rather than a deliberate decision to allow AI crawlers.


    Methodology

    • Sample: 30 UK SME websites selected from web search results across nine industries
    • Date: 12 March 2026
    • robots.txt checks: Presence of robots.txt file, followed by analysis for User-agent rules targeting GPTBot, ClaudeBot, Claude-Web, PerplexityBot, ChatGPT-User, Anthropic, and CCBot. A crawler was classified as "blocked" if a Disallow: / rule followed its User-agent declaration
    • Limitations: robots.txt analysis only checked the file at the site root. Some sites may use alternative methods to block AI crawlers (e.g. server-level blocks, Cloudflare rules, meta robots tags) that would not appear in robots.txt. This is a convenience sample and should not be extrapolated to all UK businesses. We cannot determine from robots.txt alone whether blocks are intentional business decisions or unintended consequences of plugins or developer configurations.

    FAQ

    What is robots.txt?

    robots.txt is a plain text file at the root of a website (e.g. yoursite.co.uk/robots.txt) that provides instructions to web crawlers about which parts of the site they can access. It is a standard convention that has been used since the 1990s to manage how search engines and other automated systems interact with websites.

    What is GPTBot?

    GPTBot is OpenAI's web crawler, used to gather content from websites for use in ChatGPT and related AI products. It identifies itself as "GPTBot" in its User-agent string when crawling websites. Website owners can allow or block it through their robots.txt file.

    Does blocking GPTBot mean I won't appear in ChatGPT?

    Not necessarily. ChatGPT draws from its training data, third-party sources, and other references beyond direct web crawling. Blocking GPTBot prevents ChatGPT from directly crawling your current website content, but the AI may still reference your business based on other information sources. However, blocking does prevent ChatGPT from accessing your most up-to-date content through crawling.

    Why would a website block AI crawlers?

    Reasons include content protection concerns, data sensitivity, philosophical objection to AI training on web content, server performance considerations, or — in some cases — unintentional blocks added by security plugins or developer configurations.

    How do I know if my website is blocking AI crawlers?

    Visit yourdomain.co.uk/robots.txt in a browser. Search for "GPTBot", "ClaudeBot", "PerplexityBot" or "Anthropic". If any of these appear with a "Disallow: /" rule, that crawler is blocked. If none appear, AI crawlers are not specifically blocked (though a blanket "User-agent: * / Disallow: /" rule would block everything).

    Can WordPress plugins block AI crawlers without me knowing?

    Yes. Some security and SEO plugins include AI crawler blocks in their default or recommended settings. If you use Wordfence, Sucuri, All in One Security, or similar plugins, check their settings for any rules related to AI bots, GPTBot or automated crawlers.

    Should I block or allow AI crawlers?

    This is a business decision. If you want AI platforms to be able to read and understand your current website content — which may help them accurately describe and recommend your business — allowing AI crawlers makes sense. If you have concerns about content usage or data sensitivity, blocking is a valid choice. The key is making it an informed decision rather than an accidental one.

    How many UK websites block AI crawlers overall?

    Based on our check of 30 UK SME websites in March 2026, 23.3% blocked at least one AI crawler. Other industry research has found varying rates — one study found GPTBot blocked by approximately 5.89% of websites globally. The higher rate in our sample may reflect the inclusion of security-conscious SME sites that use plugins with default blocking rules.

    What happens if I unblock AI crawlers?

    Removing AI crawler blocks from your robots.txt allows those platforms to crawl your website content going forward. This does not guarantee immediate changes in how AI platforms describe your business — crawling schedules vary and AI systems process new information over time. But it does remove a barrier to AI platforms accessing your current content.

    Are there AI crawlers beyond GPTBot, ClaudeBot and PerplexityBot?

    Yes. Other notable AI crawlers include ChatGPT-User (OpenAI's browsing mode), Bytespider (TikTok/ByteDance), CCBot (Common Crawl, used by many AI training datasets), Amazonbot (Amazon), and various others. A comprehensive robots.txt review should consider all crawlers a business wants to allow or restrict.


    This research was conducted by Rank4AI as part of our ongoing work understanding how UK businesses appear in AI-powered search platforms. We publish original data alongside curated industry statistics to help UK businesses make informed decisions about AI search visibility. Our findings are observational and should not be taken as guarantees of specific outcomes.

    For more on how AI crawlers interact with business websites, see our guides on AI search visibility, technical AI signals, and our free llms.txt creator tool.

    Want to understand how these trends affect your business?

    Start Your AI Visibility Review

    Trust, Legal and Governance

    Rank4AI is a UK based AI search agency operated by Rank4AI Ltd. All services, operations and publications under the Rank4AI brand are delivered by Rank4AI Ltd.

    Legal and Registration

    • Rank4AI Ltd registered in England and Wales. Company number 16584507.
    • Organisation DUNS number 233980021.
    • Registered supplier on UK Government procurement platforms including Contracts Finder.
    • Company registration details publicly available via Companies House and OpenCorporates.
    • Registered with the UK Information Commissioner's Office. ICO registration number ZC095410.

    Standards and Governance

    • Operates under UK data protection and consumer standards.
    • Aligns internal processes with UK GDPR principles.
    • Aligns internal processes with ISO 27001 information security principles.
    • Aligns internal processes with ISO 9001 quality management principles.
    • Working towards Cyber Essentials certification.

    Domain Continuity

    • Primary domain www.rank4ai.co.uk.
    • Previously operated at www.rank4ai.online.
    • Business ownership, entity and services remain unchanged following domain transition.

    Reviewed quarterly. Last reviewed 27 March 2026.