Google has introduced DolphinGemma, an AI model developed in collaboration with Georgia Tech and the Wild Dolphin Project (WDP), aiming to decode dolphin vocalizations and facilitate two-way communication between humans and dolphins.
DolphinGemma is trained on decades of underwater audio and video data collected by WDP, focusing on Atlantic spotted dolphins. The model identifies patterns in dolphin vocal sequences and can generate realistic dolphin-like sounds. Operating as an audio-in, audio-out system, it predicts subsequent dolphin sounds similarly to how language models predict the next word in human text.
The AI system is based on Google’s lightweight Gemma models and utilizes SoundStream for audio representation. With approximately 400 million parameters, DolphinGemma is compact enough to run on Pixel phones used in field research.
This advancement could revolutionize our understanding of dolphin communication, providing insights into their social behaviors and potentially enabling interactive exchanges between humans and dolphins. By identifying recurring sound patterns and sequences, researchers hope to uncover hidden structures and meanings within dolphin vocalizations.
Google plans to release DolphinGemma as an open model this summer, allowing researchers worldwide to adapt it for studying other cetacean species. Fine-tuning will be necessary for species with different vocal patterns, but the open-source nature of the model facilitates such adaptations.