WritingTogether AITogether AIpublished Dec 17, 2025seen 5d

Research POV: Yes, AGI Can Happen – A Computational Perspective

Open original ↗

Captured source

source ↗

Research POV: Yes, AGI Can Happen – A Computational Perspective

⚡️ FlashAttention-4: up to 1.3× faster than cuDNN on NVIDIA Blackwell →

Introducing Together AI's new look →

🔎 ATLAS: runtime-learning accelerators delivering up to 4x faster LLM inference →

⚡ Together GPU Clusters: self-service NVIDIA GPUs, now generally available →

📦 Batch Inference API: Process billions of tokens at 50% lower cost for most models →

🪛 Fine-Tuning Platform Upgrades: Larger Models, Longer Contexts →

All blog posts

Research

Published 12/17/2025

Research POV: Yes, AGI Can Happen – A Computational Perspective

Authors

Together AI

Table of contents

40+ Models Chosen for Production...40+ Models Chosen for Production...40+ Models Chosen for Production...

Links in this article

Yes, AGI Can Happen

Summary

Dan Fu, our VP of Kernels, has published a new post challenging the idea that AI is hitting a hardware wall. He argues that we are vastly underutilizing current chips and that better software-hardware co-design will unlock the next order of magnitude in performance.

Is progress toward AGI hitting a wall? In the fast-moving world of AI, there is a growing debate about whether we are approaching the “limits of digital computation.” Some recent analysis suggests that hardware constraints and stalled GPU progress might bottleneck the road to generally useful AI. Dan Fu , who leads our kernels research team, offers a different, more optimistic perspective in his latest post: " Yes, AGI Can Happen – A Computational Perspective. " While acknowledging the real constraints we face, Dan argues that we are far from hitting a ceiling. In fact, he suggests that today’s AI systems are nowhere near their theoretical limits. In his deep dive, he breaks down the numbers to show exactly where the "headroom" lies: We are underutilizing current hardware: Today’s state-of-the-art training runs (like DeepSeek-V3 or Llama-4) often achieve only ~20% Mean FLOP Utilization (MFU), and inference utilization is often in the single digits. There is massive efficiency to be unlocked through better software-hardware co-design and innovations like FP4 training. Models are a lagging indicator: The models we use today were trained on "old" hardware. The next generation of compute—massive clusters of 100k+ latest generation GPUs—hasn’t even fully entered the equation yet. Utility is already here: Even without future leaps, current models are already transforming complex workflows, such as writing high-performance GPU kernels with human-in-the-loop guidance.

If you are interested in the intersection of systems engineering, hardware efficiency, and the future of AI scaling, this is a must-read. Read Dan’s full analysis here.

Notability

notability 3.0/10

Opinion piece, no new release or traction.