Tuesday, August 12, 2025

Irony

Here's the Ars Technica article: AI industry horrified to face largest copyright class action ever certified.

Well, of course they're horrified.  For all the potential in AI, there's no question that LLM models have been trained through a process of only thinly disguised plagiarism. Discovery would be so conclusive that they would undoubtedly lose.

What defendants, particularly large one, usually do with a class action suit is send an utter avalanche of documents in response to requests for information. Hundreds of thousands of them. What this effectively does is bury the defendant in so many documents they have almost no chance of finding the pertinent ones. Sure, they might be a smoking gun or two included, but if they're part of 150,000 documents, what are the chances they'll be found? This strategy is in the barely legal category, but it's not strictly illegal, and any process of determining such illegality would take years.

Here's the irony: if the plaintiffs use the properly customized LLM model to process the documents, their chances of finding the pertinent, damning information will be much, much higher. It's the kind of task where AI should excel. 

At some point, these companies will probably write gigantic checks in exchange for immunity from plagiarism charges focusing on past behavior, and pay for access to data going forward. Still, it won't be anything near what they've actually made. It never is, in this country.

Site Meter