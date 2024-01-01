Shafaqna English- Enterprises are increasingly blocking AI web crawlers that scrape data from websites to train large language models, causing performance issues and security risks. Unlike traditional search engine crawlers like GoogleBot, which adhered to ethical content scraping guidelines, new AI bots such as Bytespider, PerplexityBot, ClaudeBot, and GPTBot have been violating these norms, as Economic Times reported.

Their aggressive data collection has led to increased overhead costs for websites and content portals, prompting many to adopt anti-scraping technologies to restrict bot access. Cloudflare reports that nearly 40% of the top 10 internet domains accessed by AI bots are now moving to block these crawlers.

This surge in AI scraping has raised particular concerns for news publishers, whose authored content is often scraped without attribution. Industry body Nasscom has warned that the use of copyrighted content for AI training could lead to legal disputes, citing the ongoing case between ANI Media and OpenAI. As AI developers face mounting pressure to respect intellectual property laws, they are being urged to be cautious in collecting data for training purposes. With more companies taking steps to block AI crawlers, the industry is calling for clearer guidelines to balance innovation with respect for content ownership and copyright.

Source: Economic Times

