Bots Flood Websites With Their Demand for AI Data

Bots Flood Websites With Their Demand for AI Data Bots Flood Websites With Their Demand for AI Data

AI training bots are hammering digital cultural collections, sparking serious pushback from libraries, galleries, museums, and archives.

The GLAM-E Lab report, released Tuesday, reveals that AI bots are swarming GLAM websites at unprecedented levels. These web crawlers are scraping massive amounts of data for AI model training, causing site slowdowns and outages.

The survey covered 43 organizations. 39 reported a spike in traffic attributed mainly to AI data-harvesting bots. Some say the bot rush started as early as 2021; others only noticed it this year.

Advertisement

Bots often ignore robots.txt rules, which should restrict crawlers but currently fail to control these swarms. Some bots identify themselves, many don’t.

The hit on digital collections is hitting budgets hard. Institutions face rising costs for servers, firewalls, and staff to fight this unrelenting bot traffic — resources they say they can’t sustain.

The report warns:

"Bots are widespread, although not universal. Of 43 respondents, 39 had experienced a recent increase in traffic. Twenty-seven of the 39 respondents experiencing an increase in traffic attributed it to AI training data bots, with an additional seven believing that bots could be contributing to the traffic."

Plus:

"Respondents worry that swarms of AI training data bots will create an environment of unsustainably escalating costs for providing online access to collections."

Cloudflare and AWS bot defenses help somewhat, but GLAM-E Lab says locking collections behind logins defeats the purpose of providing public access. Some bot traffic is also valued for indexing by search engines, adding complexity.

A separate report from the Confederation of Open Access Repositories reinforces GLAM-E Lab’s findings. 66 open access repositories say aggressive bots are causing slowdowns and outages, mostly believed to be AI data scrapers.

Other organizations hit by abusive AI crawlers include Wikimedia Foundation, iFixit, ReadTheDocs, and developer Dennis Schubert.

GLAM-E Lab calls for AI companies to find smarter, more sustainable ways to gather data online.

It wraps up:

"The cultural institutions that host online collections are not resourced to continue adding more servers, deploying more sophisticated firewalls, and hiring more operations engineers in perpetuity. That means it is in the long-term interest of the entities swarming them with bots to find a sustainable way to access the data they are so hungry for."

Add a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Advertisement