Tag: AI benchmarks

  • DeepSeek Launches R1 Model with Enhanced AI and Reduced Hallucinations

    DeepSeek Launches R1 Model with Enhanced AI and Reduced Hallucinations

    Key Takeaways

    1. DeepSeek-R1-0528 outperforms its predecessor and rivals in cost-effectiveness and training speed.
    2. The model shows improvements in performance but only answers 17% correctly on the difficult Humanity’s Last Exam.
    3. Enhanced training periods and fine-tuning contribute to the model’s better results, rather than major technological breakthroughs.
    4. The new R1 model has fewer occurrences of AI hallucinations, providing more accurate information.
    5. An open-source version of the R1 model is available, requiring an Nvidia 4090 GPU with 24 GB of memory for use.


    DeepSeek has introduced the newest iteration of its innovative R1 AI large language model, named DeepSeek-R1-0528. The firm made its entrance into the AI sector with the releases of V3 and R1, both of which achieved top-ten performance in AI while being more cost-effective and quicker to train compared to rival models from companies like OpenAI and Google.

    Performance Tests

    The recent R1 model underwent evaluation using various AI benchmarks:

    When compared to the initial release of R1, DeepSeek-R1-0528 shows better performance across all tests, although it only manages to answer 17% of the questions correctly on the challenging Humanity’s Last Exam. Since its main competitors also struggle on this particular test, the improvements seen in the latest DeepSeek R1 version are likely a result of extended training periods and fine-tuning rather than any major advancements in AI technology. A key highlight of the new R1 is its reduced instances of AI hallucinations, making it less prone to providing incorrect or misleading information.

    Open-Source Availability

    For those interested in exploring the open-source R1 model, it is possible to run distilled versions with eight billion parameters using an Nvidia 4090 GPU that has 24 GB of memory.

    In summary, DeepSeek continues to push the boundaries of AI with its latest R1 model, making significant strides while maintaining affordability and efficiency. Users can find more about DeepSeek through its platforms, including DeepSeek news, DeepSeek Chat, and the DeepSeek R1 on GitHub.

    Source:
    Link


     

  • Google Unveils Powerful Gemini 2.0 Pro AI Features

    Google Unveils Powerful Gemini 2.0 Pro AI Features

    Google has rolled out access to its latest AI, the Gemini 2.0 Pro experimental model. This new AI features a massive two million token input window, the largest of any Google AI to date, allowing it to manage very large text inputs. Gemini is engineered to tackle complicated prompts with these extensive inputs. Furthermore, Gemini 2.0 Pro has the ability to browse the internet and run code, while also being capable of generating code for applications.

    Performance Compared to Other Models

    In terms of performance, Gemini 2.0 Pro surpasses previous AI models from the company across various standardized large language model benchmarks. Nevertheless, it still hasn’t reached the capabilities of humans or the top-performing AIs in every category evaluated. For instance, on the LiveBench AI LLM benchmark, the experimental scores for Gemini 2.0 Pro are only 65.13, compared to Deepseek R1’s 71.57 and OpenAI’s o3-mini which scored 75.88 in high mode.

    Human Evaluation and Security Measures

    Even so, when human evaluators assess AI based on their own prompts, Gemini 2.0 Pro stands out as one of the top two AIs globally today, according to the responses it provided on the OpenLM.ai Chatbot Arena Elo ranking. Hackers may find themselves frustrated with Gemini 2.0 Pro, as it utilized self-training methods during development to minimize the chances of producing unsafe responses.

    Subscription and Availability

    Gemini 2.0 Pro is accessible to all users of Google Gemini Advanced who subscribe for $19.99 monthly. It is also available for developers using Google AI Studio and Vertex AI. Users interested in having Gemini at their fingertips can download the Gemini app on their smartphones or buy a Google Pixel 9 Pro smartphone that comes with Gemini integrated (available for purchase on Amazon).

    Source:
    Link