Tobold's Blog
Thursday, January 30, 2025
 
Supreme AIrony

In various lawsuits, OpenAI claimed that it would be impossible to create AI tools like ChatGPT without copyrighted material, and that such use falls under the "fair use" exception for such material. Now OpenAI is threatening Deepseek, because Deepseek fully agreed with OpenAI and used OpenAI material to train their own AI chatbot, in a process called "distillation". Isn't it AIronic?

The amount of data on the internet is estimated to be in the zettabytes. It is easy to see why it did cost OpenAI a lot of money to grab a good chunk of that, copyrighted or not, and turn it into a much smaller data volume in their AI model. And it is equally easy to see how Deepseek would spend a lot less money on training their AI model, if they used the already "distilled" OpenAI data instead of the raw data.

That ends us with a bit of a moral dilemma: If it is legal to just grab the distilled data from somebody else, then soon everybody will just do that, and nobody will want to do the hard, expensive, and not valued work of sifting through the raw data. So at some point all AI chatbots would be stuck with the knowledge of 2025, because nobody wanted to waste all the money to gather the newer knowledge. But if we say that a company like OpenAI can gather copyrighted data for free and then has copyright on the distilled data and can monetize the data, we are basically giving them a license to steal stuff and resell it.

Comments:
It would make the content creators happy.

The ai companies are not following robots.txt and in some cases are scraping sites a million times a day.

To the point that multiple sites are trying to poison their content when consumed by AI.
 
Post a Comment

<< Home
Newer›  ‹Older

  Powered by Blogger   Free Page Rank Tool