Synthetic data – the saviour of Large Language Models can lead to “model collapse.” Could OpenAI go bankrupt? And OpenAI enters the search market which leads to a big dip in Google’s stock price.
All this and more on the “take the money and run” edition of Hashtag Trending. I’m your host, Jim Love. Let’s get into it.
Synthetic data, or data created by AI has been touted as the solution to what, outside of energy consumption, may be the biggest problem facing companies developing large language models.
These models need ever increasing amounts of data – which is putting them in conflict with publishers, authors, creators and other copyright owners.
AI models are increasingly relying on synthetic data generated by other AI models to fill knowledge gaps and overcome data restrictions.
Sam Altman made reference to this idea of synthetic data as being the solution to this key issue in a speech months ago.
And it has had some successes. In some circumstances, the idea of a model creating its own data has worked amazingly well. Google and OpenAI have both achieved breakthroughs in mathematical and logical functioning of models using a technique where the model “plays against itself” and generates a huge amount of potential scenarios. This is what Google used to train its AI to beat world champions at the strategy game Go.
But despite these successes, the idea of synthetic data has hit some speedbumps.
Researchers from the University of Oxford have found that training AI models exclusively on synthetic data can lead to “model collapse,” where the AI eventually produces incoherent or irrelevant responses. This is particularly concerning for representing underrepresented groups or languages in AI systems.
Researchers are working actively to see if these challenges can be overcome, with some success. Cohere for AI reported a 40% reduction in toxic responses from a model using targeted sampling of AI-generated data. Other researchers have used curated synthetic data to improve fairness in AI models.
Sara Hooker from Cohere for AI suggests that using data from multiple specialized “teacher” models could help avoid collapse and potentially achieve better results with smaller models.
The key questions remain: Can synthetic data truly represent the breadth of human experience, and can it surpass the capabilities of the best current models?
Vyas Sekar, a professor at Carnegie Mellon University, summarizes the situation: “AI-generated data is an amazingly useful technology, but if you use it indiscriminately, it’s going to run into problems. If used well, it can lead to really good outcomes.”
However this plays out, it’s a key element in the development of even larger and more intelligent AI foundations models. Watch this space.
Sources include: Axios
Could OpenAI go bankrupt? The creator of ChatGPT, is reportedly facing significant financial challenges despite its central role in the AI boom. According to a report by The Information, the company could be on the brink of bankruptcy within the next 12 months, with projections of a $5 billion loss in 2024.
The startup’s expenses are staggering, with $7 billion spent on training AI models and $1.5 billion on staffing. Daily operational costs for ChatGPT alone reach $700,000. These expenditures far outpace OpenAI’s estimated annual revenue of $3.5 to $4.5 billion, derived from ChatGPT and LLM access fees.
Despite raising over $11 billion through seven funding rounds and being valued at $80 billion, OpenAI is struggling to cover its operational costs. The company is reportedly running near total capacity, with 290,000 of its 350,000 servers dedicated to ChatGPT.
CEO Sam Altman remains focused on achieving artificial general intelligence (AGI), but the company’s financial sustainability is in question. This situation highlights the enormous costs associated with developing cutting-edge AI technology and the challenges of turning these innovations into profitable ventures.
As the AI landscape continues to evolve rapidly, OpenAI’s financial struggles raise important questions about the sustainability of current AI development models and the need for new funding strategies in this high-stakes technological race.
Sources include: Windows Central
OpenAI is upping the ante in its challenge to Google’s search dominance by testing a new feature for ChatGPT. This tool incorporates real-time information into responses, allowing the AI to answer questions with up-to-date data and provide relevant links.
Currently available to a limited number of US users, the feature is expected to be integrated into the main ChatGPT bot in the future. OpenAI says this enhancement will make finding information “faster and easier” by allowing follow-up questions and reducing the need for multiple search attempts.
The move signals a potential shift in the search engine landscape, with AI chatbots increasingly seen as the future of online queries. This development has already impacted Alphabet’s stock, which dipped nearly 3% following the announcement.
OpenAI is addressing concerns from publishers about the impact on their traffic and revenue. It may also have an impact on sites that use content as a means of drawing in potential customers. If you can get full summary via AI, why visit the site unless it’s to validate the response?
OpenAI’s response was predicable. They stated that “We are committed to a thriving ecosystem of publishers and creators,” and that they are working with media outlets like The Atlantic and News Corp on this new search feature.
The expansion of AI in search raises environmental concerns due to high energy consumption. It also faces legal challenges, with OpenAI previously sued for allegedly using copyrighted content to train its systems.
But there’s another warning light that may have been missed. OpenAI is not the first entrant into the AI search industry. Perplexity.ai has been around for months now (that’s decades in the AI world) and has built a large an loyal following for its AI search. But here’s the tough part. Perplexity.ai uses OpenAI and has built its solution based on that. Now, OpenAI enters the field and what does this mean to Perplexity?
This is an extreme example, but it’s also a warning for companies that build their offerings based on any of the major foundation models. What happens when OpenAI or Google or some other major player decide they want to offer something similar to you? Given the financial issues that OpenAI is reportedly experiencing, their need for revenue could push them to look everywhere for new revenue.
We had a similar experience in another life when we partnered with a large US tech publisher. They decided they didn’t need us anymore and we were too small to fight them. I won’t say that was the only thing that killed our business, but it certainly was a big blow.
Now with a very few companies dominating the AI foundation models, it will be interesting to see how well they play with the other kids, especially when times get tough.
Sources include: BBC, Open AI and YouTube
And that’s our show for today. You can find show notes at our news site technewsday.com or .ca take you pick.
Hashtag Trending is on summer hours but we’ll probably stick with a four show week and our weekend edition will be released early on Friday.
You can still find us on YouTube and if you do, please subscribe or give us a like.
Thanks for listening. I’m your host Jim Love, have a Marvelous Monday.