Crawler Index
A large sample of top crawlers that are blocked by websites.
Update on 12-July-24
22.7%
of websites do not block any bots.
22.7%
of websites do not block any bots.
% of websites blocking * | % change week over week | Company | Purpose | User Agent |
---|---|---|---|---|
2.17% | Open AI | GPT | GPTBot | |
1.48% | Common Crawl | Training Data | CCBot | |
0.90% | Bard/Gemini/PaLM/Bison | Google-Extended | ||
0.75% | Turnitin | Plagiarism & AI detector | turnitinbot | |
0.70% | OpenAI | Chat GPT | chatgpt-user | |
0.65% | Amazon | Alexa | amazonbot | |
0.42% | Meta AI | LIaMA | FacebookBot | |
0.38% | Brandwatch | Magpie Crawler | magpie-crawler | |
0.28% | ByteDance | ByteDance LLM N/A | Bytespider | |
0.16% | Anthropic | Claude | Anthropic-AI | |
0.13% | Anthropic | Claude | claudebot | |
0.10% | Anthropic | Claude | claude-web | |
0.08% | Perplexity | Chatbot | perplexitybot | |
0.06% | Cohere | Cohere Command | Cohere-AI |
% of websites blocking * | % change week over week | Company | Purpose | User Agent |
---|---|---|---|---|
9.91% | Nutch | nutch | ||
5.20% | Petal | petalbot | ||
2.29% | Majestic | mj12bot | ||
1.45% | Baidu | baiduspider | ||
1.38% | Yandex | yandex | ||
0.50% | Bing | bingbot | ||
0.07% | googlebot | |||
0.05% | Ask.com | Teoma | ||
0.04% | You.com | youbot | ||
0.01% | Yahoo | slurp | ||
0.01% | Duck Duck Go | duckduckbot | ||
0% | Internet Archive | Wayback Machine | archive.org | |
0% | Nutch | nutchorg | ||
- |
Bright Data scrapes the world’s most sought-after public web data on billions of top websites. Through our compliance product, Bright Shield, we collect allow and disallow commands for user agents in robot.txt from the websites we scrape. Our current sample size of websites is 704314 and we have collected about 1,700 unique user agents.
Our research team has identified the percentage of time each user agent is blocked and allowed within our sample to determine the most blocked and most allowed user agents. This is how user agents are identified for each chart. Those numbers are tracked week-over-week and the percentage of change is recorded. We also track the overall percentage of websites that allow all crawlers. Each user agent is identified to the best of our ability by company, use, and a link that includes additional information such as how to block it.
Comments on user agents? Email comments to [email protected]