Tag: Grok 3

  • AI Showdown: Grok Impresses Mrwhosetheboss, ChatGPT Triumphs

    AI Showdown: Grok Impresses Mrwhosetheboss, ChatGPT Triumphs

    Key Takeaways

    1. Grok performed well initially but struggled before finishing second to ChatGPT.
    2. ChatGPT and Gemini had an advantage with a video generation feature not available to other models.
    3. In a real-world problem-solving task, Grok gave the most direct answer, while Perplexity struggled with confusion.
    4. In cake-making challenges, Grok correctly identified the odd item, while other models misidentified it.
    5. All models experienced “hallucinations,” confidently stating incorrect information during various tests.


    In a recent video, Mrwhosetheboss put various AI models to the test, including Grok (Grok 3), Gemini (2.5 Pro), ChatGPT (GPT-4o), and Perplexity (Sonar Pro). Throughout the video, he shared his admiration for Grok’s performance. Initially, Grok performed really well but then struggled a bit before regaining its strength and ended up in second place behind ChatGPT. It’s important to note that ChatGPT and Gemini received an advantage due to a feature that the other models did not have — video generation.

    Testing Real-World Problem Solving

    To start the evaluation, Mrwhosetheboss examined the AI models’ ability to solve real-world problems. He presented each model with the following prompt: “I drive a Honda Civic 2017, how many of the Aerolite 29″ Hard Shell (79x58x31cm) suitcases would I be able to fit in the boot?” Grok gave the most direct answer, stating “2”. ChatGPT and Gemini suggested that theoretically, it could fit 3, but realistically, it would be 2. On the other hand, Perplexity got confused and, after doing simple math, mistakenly concluded that it could fit “3 or 4” without considering the suitcase’s shape.

    Challenging Cake-Making Skills

    Next, Mrwhosetheboss didn’t hold back as he asked the chatbots for cake-making advice. He also included an image of five items, one of which was out of place for baking — a jar of dried Porcini mushrooms. Most of the models fell for this ruse. ChatGPT misidentified it as a jar of ground mixed spice, Gemini thought it was crispy fried onions, and Perplexity guessed it was instant coffee. Grok, however, correctly recognized it as a jar of dried mushrooms from Waitrose. Here is the image he provided:

    Universal Hallucinations

    Continuing with the testing, he challenged the AIs with math, product suggestions, accounting, language translations, logical reasoning, and more. A common issue across all the models was hallucination; each of them showed some degree of this phenomenon at various points in the video, confidently discussing things that simply weren’t real. By the end, here’s how each AI ranked:

    Artificial intelligence has significantly eased many tasks, especially since the inception of LLMs. The book “Artificial Intelligence” (currently priced at $19.88 on Amazon) aims to help individuals make the most of AI tools.

    Source:
    Link

  • xAI Launches Grok 3 AI: Top Performance & Beta Reasoning Models

    xAI Launches Grok 3 AI: Top Performance & Beta Reasoning Models

    Key Takeaways

    1. xAI has launched the Grok 3 series of advanced AI language models, outperforming competitors in standardized benchmarks.
    2. The models were developed using a supercomputer cluster with 100,000 Nvidia GPUs, featuring both standard and mini non-reasoning variants.
    3. Grok 3 models have a one million token context window, allowing them to analyze large amounts of text for more accurate answers.
    4. The reasoning models break down complex queries methodically, excelling in math problems, coding tasks, and graduate-level questions.
    5. xAI plans to enhance Grok 3 further with a supercomputer cluster of 200,000 GPUs, currently available to Premium users on X and Grok.com.


    Elon Musk’s xAI has introduced the Grok 3 series, a new set of advanced AI large language models that excel in performance compared to other AI systems on standardized benchmarks.

    Technical Specifications

    The Grok 3 models were developed using the company’s Colossus supercomputer cluster, which features 100,000 Nvidia Hopper Tensor Core GPUs. There are two types of models released: standard and mini non-reasoning variants (Grok 3 beta and Grok 3 mini beta), along with their reasoning counterparts (Grok 3 beta (Think) and Grok 3 mini beta (Think)).

    Performance Insights

    The non-reasoning models have shown to outperform previous leading AI models, including OpenAI’s GPT-4o and DeepSeek-V3. A significant factor in their success is the extensive one million token context window, enabling the AI to analyze vast amounts of text. This capability enhances the models’ efficiency in generating accurate answers from diverse sources. However, it is important to note that the Grok 3 beta models still achieve less than 50% accuracy for fact-seeking questions on the SimpleQA benchmark, meaning human jobs are safe—at least for now.

    Reasoning Abilities

    The reasoning models approach complex queries methodically, providing visibility into the AI’s reasoning process. This allows the AI to deconstruct problems in a way similar to human experts, solving smaller components and then merging those solutions for a comprehensive answer. By selecting the DeepSearch agent, users instruct Grok 3 to perform extensive searches across the internet and utilize code interpreters, culminating in reports that summarize its discoveries. Notably, the Grok 3 (Think) models tend to excel in tackling math problems, addressing graduate-level multiple-choice questions, and executing coding tasks when compared to other AI.

    xAI plans to keep refining Grok 3 for better performance in the coming months, leveraging a supercomputer cluster with 200,000 GPUs. Currently, Grok 3 is accessible to Premium and Premium+ users on X and Grok.com.

    Source:
    Link


  • Elon Musk’s Grok 3 Enters AI Race: Overhyped or Competitive?

    Elon Musk’s Grok 3 Enters AI Race: Overhyped or Competitive?

    Key Takeaways

    1. Claims of Superiority: Elon Musk asserts that Grok 3 is the “smartest artificial intelligence on Earth,” expected to excel in mathematics, science, and programming.

    2. Launch Challenges: Grok 3 faced scrutiny during its launch due to incorrect answers to simple questions, raising questions about its reliability.

    3. Competitive Edge: Despite claims of being “far ahead,” Grok 3’s performance lead over competitors like DeepSeek R1 and GPT-4.0 is only 1-2%.

    4. Resource Investment: Built with 200,000 H100 chips and 200 million training hours, Grok 3’s slight edge raises concerns about the efficiency of such extensive computational resources.

    5. Infrastructure Growth and Funding: xAI is expanding its infrastructure with $10 billion in funding and plans for a large supercomputer cluster, aiming for significant advancements in AI capabilities.


    Elon Musk’s xAI has rolled out Grok 3, which Musk boldly claims is “the smartest artificial intelligence on Earth.” He says it’s a game changer that can outperform all other mainstream AIs in areas like mathematics, science, and programming. Musk also suggested that Grok 3 might help with SpaceX’s Mars mission and could lead to groundbreaking discoveries worthy of a Nobel Prize within three years.

    Launch Event Issues

    Even with its advanced features, Grok 3 encountered some issues during its launch. It struggled to answer a simple AI test question about the difference between “9/11” and “9/9,” which caused some users to doubt its reasoning skills. Furthermore, Grok 3 shared incorrect details about the game Path of Exile 2 during the event, raising alarms about the reliability of its answers.

    Competitive Analysis

    xAI stated that Grok 3 is “far ahead” of its rivals, but a closer look reveals its lead is merely 1-2% over models like DeepSeek R1 and GPT-4.0. Although this ranking places it among the top AI systems, the performance gap is quite minimal.

    Development and Training Details

    Grok 3 was built using 200,000 H100 chips and received 200 million hours of training. Yet, despite these substantial resources, it only slightly surpassed DeepSeek V3, which was trained on 2,000 H800 chips for a mere two months. These findings have sparked discussions about diminishing returns in AI scaling and whether such massive computational investments are efficient.

    Subscription Availability

    Grok 3 can be accessed by X Premium+ subscribers, and xAI has unveiled a new subscription tier named SuperGrok for mobile app and Grok.com users. This aligns with the trend of AI companies providing tiered access to their models.

    New Features and Updates Ahead

    One of the standout features of Grok 3 is DeepSearch, a sophisticated tool aimed at improving search, research, and data analysis abilities. xAI has also revealed intentions to incorporate voice interaction in future updates, making user interactions more intuitive and friendly.

    Financial Backing and Infrastructure Growth

    xAI is ramping up its infrastructure by securing $10 billion in funding, potentially valuing the company at around $75 billion. The firm is working on a supercomputer cluster named “Colossus” in Memphis, Tennessee, expected to be one of the largest globally. Additionally, xAI is negotiating with Dell Technologies for a $5 billion purchase of servers using Nvidia GB200 chips.

    Musk’s Attempt to Acquire OpenAI

    In a significant move, Musk along with a group of investors proposed $97.4 billion to take over OpenAI’s nonprofit assets. However, OpenAI’s CEO Sam Altman and the board turned down this offer, highlighting the fierce competition and strategic maneuvering within the AI sector.

    Grok 3 represents a crucial advancement for xAI in the crowded AI landscape. Although it has introduced new features and secured substantial funding, its performance during the launch has raised concerns about its actual effectiveness. As xAI works on improving the model and enhancing its infrastructure, only time will tell if Grok 3 can fulfill its lofty goals.