The internet and the enormous amount of data it has generated have had a tremendous influence on the advancement of artificial intelligence (AI). According to a recent Washington Post investigation, the AI industry trained its neural networks using a publicly available dataset spanning 30 years of web publication.
This investigation discovered that our online contributions, such as blogs, web pages, and social media threads, unknowingly helped AI chatbots learn. Moreover, humans unintentionally created a large archive of human expression, allowing AI models such as ChatGPT to do astounding sentence-completion tasks.
The study allows users to enter any internet domain name and determine its contribution to a specific AI training database. The researchers examined a database that had over 500,000 personal blogs, accounting for 3.8 percent of the total “tokens” in the dataset. However, because some cultures, groups, and subjects may be oversampled while others may be neglected, biases, limits, and poisonous parts of internet culture may be present in AI training data.
The immense quantity of information, thoughts, and emotions that people have created on the internet, which may be compared to digital stockpiles and landfills, is what is responsible for the developments in AI technology that we witness today.
The sources for this piece include an article in Axios.