Researchers say ChatGPT model is trained on texts from copyrighted books

May 4, 2023

1 min.

The University of California, Berkeley researchers published a paper titled “Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4” in which they discuss how OpenAI’s ChatGPT and the GPT-4 big language model were trained on material from copyrighted novels.

GPT-4 has learned a wide range of copyrighted content, according to the study team of Kent Chang, Mackenzie Cramer, Sandeep Soni, and David Bamman, with the degree of memory connected to the frequency of excerpts from the books appearing on the web. The team has made their code and data available on GitHub, as well as a list of the recognized novels, which includes titles such as Harry Potter, The Lord of the Rings, and The Hitchhiker’s Guide to the Galaxy.

The researchers observe that science fiction and fantasy literature lead the list, which they ascribe to their popularity on the internet. They also mention that remembering certain titles has a knock-on effect, making models more accurate in responding to specific instructions. ChatGPT, on the other hand, shows less understanding of works in other genres as a result of the models’ familiarity with sci-fi and fantasy novels. The researchers propose for the usage of publicly available training data to improve the transparency of the models’ behavior.

While the researchers have focused less on the copyright implications of memorizing copyrighted texts, text-generating applications based on these models may produce passages that are significantly similar or identical to copyrighted texts ingested.

The sources for this piece include an article in TheRegister.

Tags
ChatGPT

TND Newsdesk

SUBSCRIBE NOW

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways

Subscribe Now

Cyber Security Today, May 6, 2024 – Ransomware gang claims responsibility for attacking Italian healthcare service, Russian gang blamed for attacks in Europe, and...

Microsoft reveals critical security flaw affecting Android apps

Chinese government websites “Riddled with security flaws” say researchers

Cyber Security Today, May 3, 2024 – North Korea exploits weak email DMARC settings, and the latest Verizon analysis of thousands of data breaches

Microsoft reveals critical security flaw affecting Android apps

Google Play introduces new biometric verification with a user warning

Early adopters returning Apple Vision Pro headsets

Apple Vision Pro turning up in unusual and unsafe usage

Data centres face backlash over energy usage. Hashtag Trending for Tuesday, May 7, 2024

Warren Buffett warns AI May Be Better for Scammers than Society

States rethink data centres as ‘electricity hogs’ strain the grid

DOJ accuses Google of destroying key evidence in landmark antitrust case

Researchers say ChatGPT model is trained on texts from copyrighted books

Warren Buffett warns AI May Be Better for Scammers than Society

States rethink data centres as ‘electricity hogs’ strain the grid

DOJ accuses Google of destroying key evidence in landmark antitrust case

Jack Dorsey leaves Bluesky board

New AI model appears – then disappears in a few days. Hashtag Trending for Monday, May 6, 2024

SUBSCRIBE NOW

Related articles

Tests unable to distinguish AI from human reviews

Zuckerberg shares his vision with investors and Meta stock tanks

AI surpasses human benchmarks in most areas: Stanford report

Microsoft and OpenAI partner to build a $100 Billion AI supercomputer “Stargate”

Become a member