New Large Language Models promise “infinite length.”

April 24, 2024

1 min.

DALL·E 2024-04-24 21.10.15 - A visually captivating, digitally styled illustration of advanced AI technology, featuring elements like a neural network map, flowing streams of data

As the world of artificial intelligence evolves, major tech giants like Microsoft, Google, and Meta are pioneering the development of large language models (LLMs) with potentially infinite context lengths. This advancement could revolutionize AI’s understanding and processing capabilities, eliminating the constraints of memory and enhancing the model’s utility across various applications.

Meta’s introduction of MEGALODON represents a significant leap forward. This new neural architecture is designed to handle sequences with unlimited context lengths efficiently, addressing the Transformer architecture’s limitations such as quadratic computational complexity. With innovations like the Complex Exponential Moving Average (CEMA) component and a timestep normalization layer, MEGALODON is set to underpin future iterations of Meta’s AI models, starting with the anticipated Llama 3.

Google’s Infini-Attention mechanism integrates compressive memory with traditional attention frameworks, creating a scalable model capable of managing input sequences of unprecedented length. This model combines local masked attention with long-term linear attention in a novel architecture that maintains computational efficiency while increasing context awareness.

Feedback Attention Memory (FAM), another breakthrough from Google, introduces a feedback loop in the Transformer architecture. This loop allows the model to refer back to its own outputs, effectively creating a form of working memory that supports the processing of infinitely long sequences.

Additionally, Microsoft’s LongRoPE (Long Range Positional Encoding) dramatically extends the context window of LLMs up to 2 million tokens. This development, along with Microsoft’s innovative Selective Language Modeling (SLM) technique, focuses training on the most impactful tokens, optimizing the model’s effectiveness across varied applications.

Despite these advancements, there are inherent challenges in managing such extensive data inputs. Experts caution that simply increasing the token count does not inherently improve model performance. The effectiveness of an LLM in utilizing its extended context is crucial, as highlighted by NVIDIA’s Jim Fan, who emphasizes the importance of practical application over theoretical capability.

To address this, NVIDIA has developed RULER, a benchmarking tool designed to evaluate the performance of long-context models across a spectrum of tasks. This tool will help in understanding how effectively new models utilize their extended capabilities.

The move towards LLMs with infinite context lengths marks a significant milestone in AI development. It promises enhanced capabilities for complex problem-solving and decision-making applications, potentially transforming how we interact with technology. As these models become more refined and accessible, they will pave the way for more sophisticated AI applications, blurring the lines between human and machine cognition.

Tags
top story

Jim Love https://www.technewsday.com/

SUBSCRIBE NOW

Become a member

New, Relevant Tech Stories. Our article selection is done by industry professionals. Our writers summarize them to give you the key takeaways

Subscribe Now

North Korean hacker infiltrates US security vendor, loads malware

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

CrowdStrike CEO summoned by Homeland Security committee over software disaster

Canadian schools sue social media giants over alleged harm to children

ChatGPT mobile mania: Why users are flocking to ChatGPT Plus

iOS update brings back photos users thought were permanently deleted

Microsoft reveals critical security flaw affecting Android apps

CrowdStrike faces backlash over $10 “apology” voucher

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

New Large Language Models promise “infinite length.”

North Korean hacker infiltrates US security vendor, loads malware

Security company accidentally hires a North Korean state hacker: Cybersecurity Today for Friday, July 26, 2024

CrowdStrike releases an update from initial Post Incident Review: Hashtag Trending Special Edition for Thursday July 25, 2024

Security vendor CrowdStrike issues an update from their initial Post Incident Review

Homeland Security committee demands appearance by CrowdStrike CEO

SUBSCRIBE NOW

Related articles

77% of employees report that AI has increased their workload: Survey

84% of PC users unwilling to pay extra for AI features?

Google’s Gemini AI caught scanning private Google Drive documents without permission

Microsofts AI voice generator deemed “too dangerous to release.”

Become a member