Key Takeaways
1. Grok 4 has become the top AI language model, surpassing competitors like ChatGPT and Google’s Gemini.
2. Grok 4 shows a 10x increase in reasoning capabilities compared to Grok 3, aided by xAI’s expansion to 200,000 GPUs.
3. Grok 4 achieved the highest score of 15.9% on the ARC-AGI performance test, outperforming other models designed for specific challenges.
4. Elon Musk claims Grok 4 surpasses all graduate students combined and has potential for discovering new technologies, despite some limitations in image recognition.
5. xAI is introducing a new subscription model with tiers, including a premium option called SuperGrok Heavy, starting at $300 per month.
A bit more than two years since its launch, xAI’s Grok has emerged as the top AI language model, overtaking competitors like OpenAI’s ChatGPT, Google’s Gemini, DeepSeek, as well as Meta and Anthropic.
Grok 4’s Impressive Performance
Recent independent evaluations indicate that the latest version, Grok 4, has achieved the highest scores on the public AI models performance charts. The remarkable 10x increase in reasoning capabilities from Grok 3 to Grok 4 is attributed to the AI compute clusters that xAI rapidly developed, managing to double their count to 200,000 GPUs, with plans to reach a million.
The xAI team reached out to the creators of the challenging ARC-AGI performance test and requested them to evaluate their AI models, resulting in unexpected findings.
Key Findings from ARC-AGI
Firstly, the facts: Grok 4 is now the leading publicly available model on the ARC-AGI test, surpassing even those specially built solutions on Kaggle. Secondly, ARC-AGI-2 poses difficulties for contemporary AI models. To perform well, models need to acquire a mini-skill from various training examples and then apply that skill during testing. The prior highest score was about 8% (achieved by Opus 4), and scores under 10% are often unreliable. However, Grok 4 scored 15.9%, clearly demonstrating levels of fluid intelligence.
Another independent evaluator, Artificial Analysis, reported that their comprehensive suite of benchmarks showed Grok 4 attaining an Artificial Analysis Intelligence Index of 73, outpacing OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude 4 Opus at 64, and DeepSeek R1 0528 at 68.
Elon Musk’s Bold Claims
In the presentation for Grok 4, Elon Musk proclaimed that xAI’s model now surpasses all graduate students across various fields combined. With his usual grandiose claims, the CEO of Tesla asserted that Grok 4 would be capable of discovering “new technologies” such as medical advancements or engineering breakthroughs within the next year.
Nonetheless, he acknowledged that Grok would still struggle with image recognition for at least another month, and he responded to the recent controversy regarding supremacist answers by stating, “when Grok goes far wrong, it’s usually because of something silly we did, like a poor system prompt or giving too much importance to biased sources.”
New Subscription Tiers
Musk also needs to promote Grok 4 as the company rolls out a paid premium tier for the first time. Named SuperGrok Heavy, it starts at $300 per month and includes all features from the $30 per month SuperGrok tier, which provides initial access to Grok 4, along with access to Grok 4 Heavy, offering increased rate limits and early entry to new features.
Source:
Link