Fighting AI with AI
I mentioned this before here and I still stand by those words. So this is a follow-up to “what is one to do?” in regards to AI, bots, and AI bots crawling the Smol Web, and also a re-remedied approach to being online (e.g. keep a presence online, but one that serves one's self and the friends and readers of that/this content). Also of note (to myself): don't hesitate to download gratuitously – take the web with me! Relevant links, videos, music, movies, shows, books, make a personal repository of media I like, and have a personal Web of my own. So, to the blog post...
While looking at my links.txt file, and seeing Nightshade (that bastardizes what AI sees when it comes across an image – rendering the “scraping”/theft of that image useless (what it will see/take will not be the image on-screen), and also the AI/LLM user agents blocking guide, I get to thinking: is there a way to authentically fight AI with AI?
Those familiar with AI, one could probably make a list of things an AI bot does to gather (steal) data from a website, and not just “block” or “distort” what is there, but “offer” (as a false promise – a red herring) to the AI scraping the site an entirely different site than what is actually there.
robots.txt sort of does this. Opting out of search engines crawling a site, but, it does not (proactively) put into place a page or series of pages that are “bogus”.
Another thing to consider, what starts and stops an AI bot from crawling a Website? What if it starts to crawl a page, and gets 100+ pages of little to no new/useful content? Say, a smattering of plaintext files page after page. Is there a safeguard triggered by the AI to tell (“itself”) “hey, there's nothing new/useful on this site – go to next “relevant” page or new site?” Or what about a LLM (large language model's) methods and protocols for crawling a site? Is there a way to cause an “infinite loop” for an AI bot? A bot (or bots) crawl a page, and they just see more and more “useful” information, and then this site, the site having data stolen from it, creates a (Web) environment where the bot does not leave that site/service? Forever thinking it is getting new, useful information, just to be “stuck” on that site for an indefinite period of time. Sort of an AI trap?
It reminds me of the pod creatures that attack clams. They latch onto the top of a clam, and they grind and dig at the top of the shell in an attempt to eat the meat inside. Some clams, will actually die of what amounts to cardiac arrest because of this, knowing they may well be eaten alive. But, these clams also have a “fuck you” response when this is happening. They stick out their small tentacles, those which they use to create a “thread” to the ocean floor (which allows a clam to stay in one spot, not be carried away by undercurrents), and attach multiple threads to the pod creature attempting to drill into it's shell. Multiple threads, one side on the back of the creature, the other end on the ocean floor. The pod creature gets its meal, and then dies of starvation atop the empty shell of the creature that sealed it's fate.
With enough familiarity of LLM's, and the way they crawl a site, and how/what they find useful/relevant once it starts to sift (steal) data, a person, site, service could create a way to not bastardize a photo or a paragraph, not cause the AI to “just” hallucinate, but highjack the crawling mechanisms themselves. Make the AI think it's got a nice, juicy wealth of information, to find itself just consuming, re-consuming, and link-bouncing again and always on the same site.
Part of the AI Notes series. Previous entries here