ChatGPT hides copyright training data, research finds

Share post:

ChatGPT is trying to hide that it was trained on copyrighted material, according to new research published in a paper by a group of AI scientists from ByteDance.

The researchers found that ChatGPT now disrupts its outputs when users try to extract the next sentence from a prompt. This is a new behavior that was not present in previous versions of ChatGPT.

The researchers believe that ChatGPT developers have implemented a mechanism to detect if the prompts aim to extract copyright content. They also found that ChatGPT still responds to some prompts with copyrighted material, even with these new measures in place.

This is not the only LLM that has been found to contain copyrighted material. Other LLMs, such as OPT-1.3B from Meta and FLAN-T5 from Google, have also been found to respond to prompts with copyrighted text.

The researchers suggest that this is because LLMs are trained on massive amounts of data, including text from books, articles, and websites. This data often includes copyrighted material, which can then be inadvertently reproduced by the LLMs.

The sources for this piece include an article in BusinessInsider.

Featured Tech Jobs

SUBSCRIBE NOW

Related articles

Toyota AI teaches robots to make breakfast

Toyota Research Institute (TRI) has used generative AI to teach robots to make breakfast, or at least, the...

Google’s Bard chatbot gets new features

Google's Bard chatbot has received a major update that gives users the ability to double-check its answers and...

Microsoft AI researchers accidentally leak 38TB of data

Microsoft AI researchers accidentally leaked 38TB of sensitive data, including backups of personal information belonging to Microsoft employees....

Tech giants call for regulation of artificial intelligence

Tech giants such as Tesla, Meta, Google, and Microsoft have called for regulation of artificial intelligence (AI), following...

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways