ChatGPT hides copyright training data, research finds

Share post:

ChatGPT is trying to hide that it was trained on copyrighted material, according to new research published in a paper by a group of AI scientists from ByteDance.

The researchers found that ChatGPT now disrupts its outputs when users try to extract the next sentence from a prompt. This is a new behavior that was not present in previous versions of ChatGPT.

The researchers believe that ChatGPT developers have implemented a mechanism to detect if the prompts aim to extract copyright content. They also found that ChatGPT still responds to some prompts with copyrighted material, even with these new measures in place.

This is not the only LLM that has been found to contain copyrighted material. Other LLMs, such as OPT-1.3B from Meta and FLAN-T5 from Google, have also been found to respond to prompts with copyrighted text.

The researchers suggest that this is because LLMs are trained on massive amounts of data, including text from books, articles, and websites. This data often includes copyrighted material, which can then be inadvertently reproduced by the LLMs.

The sources for this piece include an article in BusinessInsider.


Related articles

Microsoft’s AI success may spell defeat for it’s climate goals

Microsoft's ambitious strides in AI technology are now posing a significant challenge to its own climate goals, as...

OpenAI’s Chief Scientist Ilya Sutskever Departs Company

Ilya Sutskever, co-founder and chief scientist of OpenAI, has officially announced his departure from the company. This move...

OpenAI snubs Microsoft, launching GPT-4o only on macOS

OpenAI, despite Microsoft's substantial $10 billion investment, has chosen to release its new ChatGPT app exclusively on macOS,...

Apple to integrate ChatGPT into iPhones

Apple Inc. is on the brink of solidifying a deal with OpenAI to integrate the ChatGPT technology into...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways