Shaare your links...
32922 links
Liens en vrac de sebsauvage Home Login RSS Feed ATOM Feed Tag cloud Picture wall Daily
Links per page: 20 50 100
page 1 / 1
  • Block AI Bots from Crawling Websites Using Robots.txt
    Une liste de User-Agent à bloquer (ou troller).

    EDIT: Du coup j'ai mis ça dans la config de mon Apache:
    RewriteCond "%{HTTP_USER_AGENT}" "(ChatGPT-User|Meta-ExternalFetcher|Amazonbot|Applebot|OAI-SearchBot|PerplexityBot|YouBot|Applebot-Extended|Bytespider|CCBot|ClaudeBot|Diffbot|FacebookBot|Google-Extended|GPTBot|Meta-ExternalAgent |omgili |AI Agent Anthropic-AI|AI Agent Claude-Web|Cohere-AI Agent|Ai2Bot|Ai2Bot-Dolma|GoogleOther|GoogleOther-Image|GoogleOther-Video|ImagesiftBot|PetalBot|Scrapy|Timpibot|VelenPublicWebCrawler|Webzio-Extended|facebookexternalhit)" [NC]
    RewriteRule .* - [R=429,L]

    (HTTP 429 c'est "Too many requests")

    Bien sûr je sais que ça ne suffit pas (beaucoup de bots mentent sur leur User-Agent), mais c'est déjà ça.

    EDIT: Voir aussi : https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.txt
    2025-01-06 10:11:52
    https://originality.ai/ai-bot-blocking
Links per page: 20 50 100
page 1 / 1
Shaarli 0.0.41 beta modifiée - 2022-08-11 - The personal, minimalist, super-fast, no-database delicious clone. By sebsauvage.net. Theme by idleman.fr. I'm on Mastodon.
shelter.moe