Tag: AI model

  • Moonshot AI Launches Kimi K2: Free Alternative to DeepSeek

    Moonshot AI Launches Kimi K2: Free Alternative to DeepSeek

    Key Takeaways

    1. Moonshot AI launched Kimi K2, a free large language model (LLM) with one trillion parameters, under a modified MIT license.
    2. Kimi K2 ranks among the top ten powerful AI models globally, outperforming the notable AI model DeepSeek.
    3. The model utilizes a mixture-of-experts (MoE) architecture with a 128K context window and 384 experts for complex problem-solving and reasoning.
    4. Kimi K2 was trained using real and simulated environments, incorporating a self-assessment mechanism and the MuonClip optimizer for improved training stability.
    5. Users can access Kimi K2 for free through a chatbot, while developers can purchase API access, requiring significant hardware for business applications.


    Moonshot AI has unveiled Kimi K2, a free large language model (LLM) released under a modified MIT license. This LLM quickly secured a spot in the top ten most powerful AI models globally on the LMSys text arena leaderboard. Kimi K2 outperformed DeepSeek, another notable free AI that captured global attention for its capabilities and open licensing when it debuted at the end of 2024.

    Specifications of Kimi K2

    Kimi K2 boasts one trillion parameters (1T) and operates as a mixture-of-experts (MoE) model. It features a 128K context window and utilizes 384 experts from a subset of 32 billion active parameters. This advanced AI was designed for AI agents focused on autonomous problem-solving, reasoning, and tool utilization, making it suitable for tackling complex challenges and researching solutions to high-level business issues.

    Training Methodology

    Owing to a scarcity of real-world tool-use training data, Kimi K2 was developed using both real and simulated environments. The training process also incorporated a self-assessment mechanism, which enabled the AI to evaluate the adequacy of its own completed tasks during the training phase. Moreover, the MuonClip optimizer was created to counteract training stability problems associated with the Muon optimizer for neural networks, allowing Kimi K2 to be pre-trained swiftly on 15.5T tokens.

    For those interested in using Kimi K2 in a business context, a minimum of 1TB storage is necessary, along with a cluster consisting of at least 16 Nvidia H20/H200 GPUs before they can freely download it from Hugging Face. Home users can easily operate distilled versions of DeepSeek on Nvidia GPUs with 12GB or more of memory, such as this card available on Amazon, while awaiting distilled versions of Kimi K2.

    Source:
    Link


     

  • Discover DeepSeek V3.1 AI: Faster and Smarter for Free

    Discover DeepSeek V3.1 AI: Faster and Smarter for Free

    Key Takeaways

    1. DeepSeek-V3.1 is a new AI model that was launched in December 2024 and is now among the top ten powerful AI models globally.

    2. The model was trained using less computing power and at a lower cost, featuring a hybrid design that combines fast non-thinking and more deliberative thinking capabilities.

    3. DeepSeek-V3.1 is available for free under the open-source MIT license, but specific hardware requirements apply for different model sizes.

    4. Performance improvements include enhanced coding capabilities and better scores on several AI benchmarks compared to previous models.

    5. The model supports a 128K token window, and API access pricing will be adjusted post-September 5, 2025, while users can interact with the AI at no cost.


    DeepSeek has introduced DeepSeek-V3.1, a new iteration of its innovative AI model that was first launched in December 2024 and quickly became one of the top ten most powerful AI models globally.

    Training Breakthroughs

    The company amazed everyone by revealing how it trained this model using significantly less computing power and at a lower expense compared to rival models. This new version operates as a hybrid AI, blending a quicker non-thinking model recognized from DeepSeek V3 with a more deliberative thinking model that was characteristic of DeepSeek R1.

    Accessibility and Requirements

    The new DeepSeek AI LLM model can be downloaded for free under the open-source MIT license. Users who wish to try out the complete 671B DeepSeek-V3.1 model need to have a minimum of 720 GB of available storage (or 170GB for the 1-bit quantized variant). For the smallest quantized model, a robust GPU with at least 24 GB of memory is required, like the Nvidia 5090 GPU with 32 GB of memory available on Amazon.

    Performance Enhancements

    According to results from the SWE-bench test, the updated DeepSeek-V3.1 model enhances the coding capabilities compared to the previous non-thinking V3 and thinking R1 models. It also achieves better scores across various AI benchmarks in thinking mode than the former R1 model, including xbench-DeepSearch, SimpleQA, and FRAMES AI benchmarks.

    The V3.1 AI features a 128K token window, and the API access pricing will be streamlined after September 5, 2025, reflecting its hybrid model. Users can engage with the DeepSeek-V3.1 AI at no cost.

    Source:
    Link


     

  • AI Develops 4X Tougher Polymer to Reduce Plastic Pollution

    AI Develops 4X Tougher Polymer to Reduce Plastic Pollution

    Key Takeaways

    1. Scientists developed a new method to enhance plastic toughness using an AI model.
    2. Adding weak links to a polymer’s structure can increase its overall strength by dissipating energy during cracks.
    3. A machine learning model was trained on 400 ferrocenes to identify the best candidates for improving polymer toughness.
    4. The new plastic created is four times tougher than traditional materials using standard crosslinkers.
    5. Increased toughness can lead to longer-lasting materials and potentially reduced plastic production over time.


    A team of scientists has developed a new method for enhancing the toughness of plastic, utilizing an AI model. The group, which includes researchers from MIT and Duke University, discovered an additive that significantly improves a polymer’s resistance to tearing.

    New Findings Published

    On August 1, the findings were shared in ACS Central Science. The key to this breakthrough is based on an unexpected finding from a prior study, which showed that adding weak links to a polymer’s chemical makeup can actually increase its overall strength. This happens because when a crack forms, it has to break through many of these weak bonds rather than ripping through a solid material, leading to greater energy dissipation. However, the researchers faced a major obstacle: selecting the ideal ‘weak link’ from a vast array of potential options.

    Machine Learning Approach

    To tackle this challenge, the team turned to a machine learning model. They trained this model using computational data from approximately 400 iron-containing compounds known as ferrocenes. The model then efficiently predicted the properties of thousands of additional ferrocenes, successfully identifying the best candidates for their research.

    Enhanced Polymer Created

    Using one of the AI’s top selections, the researchers created a new type of plastic, resulting in a polymer that is four times tougher than those made with conventional crosslinkers.

    “By increasing the toughness of materials, it means they can last longer. This could lead to reduced plastic production over time,” said Ilia Kevlishvili, the lead author of the study.

    Source:
    Link


     

  • Xiaomi Unveils MiMo-7B: First Open-Source LLM for Coding and Reasoning

    Xiaomi Unveils MiMo-7B: First Open-Source LLM for Coding and Reasoning

    Key Takeaways

    1. Xiaomi has launched its first open-source AI system, MiMo-7B, designed for complex reasoning tasks and excelling in mathematics and code generation.
    2. MiMo-7B has 7 billion parameters and competes effectively with larger models from OpenAI and Alibaba, especially in mathematical reasoning and coding contests.
    3. The model’s training involved a comprehensive dataset of 200 billion reasoning tokens, using a multi-token prediction goal to enhance performance and reduce inference times.
    4. Post-training enhancements include unique algorithms for reinforcement learning and infrastructure improvements that significantly boost training and validation speeds.
    5. MiMo-7B is available in four public variants, with notable performance benchmarks in mathematics and coding, and it can be accessed on Hugging Face and GitHub.


    Xiaomi has quietly entered the large language model arena with its new MiMo-7B, marking its first open-source AI system for the public. Created by the recently formed Big Model Core Team, MiMo-7B is designed for complex reasoning tasks and excels beyond rivals like OpenAI and Alibaba when it comes to mathematical reasoning and code generation.

    Model Specifications

    As indicated by its name, MiMo-7B has 7 billion parameters. Even though it is much smaller than many leading LLMs, Xiaomi asserts that it competes equally with larger models such as OpenAI’s o1-mini and Alibaba’s Qwen-32B-Preview, all of which are capable of AI reasoning.

    Xiaomi MiMo-7B surpasses OpenAI and Alibaba’s models in mathematics reasoning (AIME 24-25) and code contests (LiveCodeBench v5).

    Training Details

    The foundation of MiMo-7B is a rigorous pre-training schedule. Xiaomi claims to have created a comprehensive dataset consisting of 200 billion reasoning tokens and has provided the model with a total of 25 trillion tokens through three phases of training.

    Instead of the conventional next-token prediction, the company opted for a multi-token prediction goal, which they say reduces inference times without compromising the quality of the outputs.

    Post-Training Enhancements

    The post-training phase combines various reinforcement learning methods alongside infrastructure enhancements. Xiaomi developed a unique algorithm called Test Difficulty Driven Reward to mitigate the sparse reward challenges often seen in RL tasks involving intricate algorithms. Moreover, they introduced an Easy Data Re-Sampling technique to ensure stable training.

    On the infrastructure side, Xiaomi has created a Seamless Rollout system to minimize GPU downtime during both training and validation. According to their internal metrics, this results in a 2.29× increase in training speed and almost a 2× boost in validation performance. The rollout engine also supports inference methods like multi-token prediction in vLLM settings.

    Availability and Performance

    Now, MiMo-7B is open source with four public variants available:
    – Base: the unrefined, pre-trained model
    – SFT: a version refined with supervised data
    – RL-Zero: a variant enhanced through reinforcement learning starting from the base
    – RL: a more refined model based on SFT, claimed to offer the best accuracy

    Xiaomi has also shared benchmarks to support its claims, at least theoretically. In mathematics, the MiMo-7B-RL variant is said to achieve 95.8% on MATH-500 and over 68% on the 2024 AIME dataset. Regarding code, it scores 57.8% on LiveCodeBench v5 and nearly 50% on version 6. Other general knowledge tasks like DROP, MMLU-Pro, and GPQA are also included, though scores hover in the mid-to-high 50s—respectable for a model with 7 billion parameters, yet not groundbreaking.

    MiMo-7B can now be accessed on Hugging Face under an open-source license, and all relevant documentation and model checkpoints are available on GitHub.


  • Samsung Unveils Second-Generation AI Model Gauss2

    Samsung Unveils Second-Generation AI Model Gauss2

    Samsung presented the next version of its multimodal generative AI model, Gauss2, during the Samsung Developer Conference 2024. The model will come in three different sizes: Compact, Balanced, and Supreme.

    Compact Model Overview

    The Compact model is designed to be small, focusing on speed and efficiency, making it perfect for usage directly on devices by "maximizing the utilization of the device’s computing resources."

    Balanced and Supreme Models

    The Balanced model strikes a blend between performance and efficiency, tailored for various tasks that need consistency. On the other hand, the Supreme model is optimized for high performance, and Samsung claims it lowers "computational costs during training and inference processes while keeping both performance and efficiency at high levels."

    This model supports up to 14 languages, including a variety of programming languages. To enhance efficiency and performance, Samsung employs a "custom tokenizer" for the languages it supports. The company notes that the model’s processing speed "per hour is 1.5 to 3 times quicker," when compared to open-source generative AI models.

    In-House Coding Assistant

    According to Samsung, its internal coding assistant, ‘code.i,’ is currently being used by 60% of all software developers at the company.

    Samsung aims to leverage the Gauss2 model to boost productivity internally. The company "will keep extending the reach of its AI-based services throughout all product lines so users can enjoy a more convenient and pleasant daily life."

    Source: Link

  • ISRO Unveils AI Model for Aircraft Tracking and Surveillance

    ISRO Unveils AI Model for Aircraft Tracking and Surveillance

    Researchers at the Space Applications Centre (SAC), part of the Indian Space Research Organisation (ISRO), have created an AI model that provides a full suite of features for monitoring airports and tracking aircraft, in addition to offering surveillance capabilities.

    Collaboration and Testing

    According to a report by The Times of India, this model was developed in partnership with an institute and has been tested in airports located in Ahmedabad, Mumbai, and Pune. The AI model makes use of the CosmiQ Works RarePlanes dataset, which is an open-source dataset for machine learning provided by CosmiQ and AI Reverie. This dataset includes both real-world and artificially created satellite images.

    Training Data Sources

    In addition to the RarePlanes dataset, the researchers trained the model using the Airbus Aircraft detection dataset and satellite imagery from Indian remote sensing.

    "A senior official mentioned to The Times of India that the deep learning models used were named YOLOv5 and YOLOv7, which are part of the You Only Look Once (YOLO) series, known for their applications in detecting aircraft within satellite images," the report stated.

    Accuracy and Future Enhancements

    The model achieved an accuracy of 94% for larger aircraft and 88% for smaller ones, with YOLOv7 proving to be the better performer in these evaluations. The research team is currently focused on enhancing the model’s features by integrating Synthetic Aperture Radar (SAR) satellite data, which can assist in detecting aircraft even during severe weather events.

    Source: Link

  • Open Source AI Video Generator Pyramid Flow Now Online

    Open Source AI Video Generator Pyramid Flow Now Online

    Already gaining traction in YouTube tutorial clips, Pyramid Flow is an innovative AI system trained on freely available datasets, amounting to about 10 million videos. This project is a collaborative effort between AI specialists from Peking University, Kuaishou Technology, and Beijing University of Posts and Telecommunications. Notably, Pyramid Flow is itself open-source. Licensed under the MIT License, it can produce virtual high-resolution (768p) video content, and it particularly excels at 384p. Its developers claim that it can generate a five-second video in under a minute, utilizing an A100 GPU in an unspecified hardware setup.

    Performance Insights

    In various situations, Pyramid Flow performs exceptionally well. Nevertheless, when handling certain text prompts, the output can be inadequate. Like many generative AI tools, there is a degree of unpredictability involved with this model. On the positive side, Pyramid Flow requires significantly less computational power compared to its rivals. Furthermore, since its code is open-source, those who are interested can implement it in local or cloud settings without any licensing concerns.

    Copyright Concerns

    While the AI team behind Pyramid Flow has provided a list of all datasets used for its training, they did not address potential copyright issues that could arise. Some content creators argue that using open-source materials to make virtual videos infringes on the rights of copyright owners. Nevertheless, Pyramid Flow might be beneficial for refining such content without needing to engage third parties.

    Pyramid Flow (on GitHub, via Tech Xplore)

  • Google Confirms Pixel 8 Exclusion from Gemini Nano Upgrade

    Google Confirms Pixel 8 Exclusion from Gemini Nano Upgrade

    Google’s Pixel 8 and 8 Pro share similar hardware specifications, differing mainly in RAM capacity. Unfortunately, Google has disclosed disappointing news for Pixel 8 users. The standard model will not receive Gemini Nano, the smallest version of the company's AI model introduced in December. This omission is disheartening for Pixel 8 owners who anticipated having the mobile-friendly LLM on their devices.

    Google Confirms Absence of Gemini Nano on Pixel 8

    In a recent episode of The Android Show on the Android Developers channel, Google officially confirmed the absence of Gemini Nano on the Pixel 8. Terence Zhang, a developer relations engineer at Google, cited hardware limitations as the reason behind this decision. While specifics regarding these limitations were not disclosed, it's worth noting that the primary distinction between the standard and pro versions of the Pixel 8 lies in the RAM capacity, with the former featuring 8GB and the latter boasting 12GB.

    Gemini Nano is currently operational on the Pixel 8 Pro and the Galaxy S24 series. This AI model provides functionalities such as Summarize in the Recorder app, Smart Reply in Gboard, WhatsApp, and offline photography enhancements. The exclusion of this AI model from the standard variant widens the gap between the Pixel 8 and 8 Pro, a factor potential buyers should consider when contemplating a Pixel 8 purchase.

    Expansion Plans for Gemini Nano

    Google has affirmed its commitment to expanding Gemini Nano's compatibility to more high-end devices. Recently, chipmaker MediaTek announced a collaboration with Google to optimize Gemini Nano for the Dimensity 9300 and 8300 chips. Furthermore, Google has outlined its initiative to incorporate Gemini into Android smartphones starting in 2025.

  • Google Introduces AI Cyber Defense Initiative in Response to Hacker Threat

    Google Introduces AI Cyber Defense Initiative in Response to Hacker Threat

    Google has recently made a significant announcement regarding their use of artificial intelligence (AI) to combat cyber threats. This new initiative, called the AI Cyber Defense Initiative, aims to enhance internet security by proactively staying ahead of hackers and cyberattacks. With the help of AI, Google hopes to simplify the process of identifying and preventing potential threats before they can cause significant damage.

    Training AI to Recognize Cyber Threats

    The challenge in maintaining digital security today is that hackers only need to find one vulnerability to exploit, while defenders must be flawless at all times. Google believes that AI can help address this issue by empowering security experts to identify and mitigate threats more efficiently. By training AI systems to detect the early signs of cyberattacks, Google aims to predict and prevent attacks before they can cause harm.

    Introducing Magika: A Malware Detection Tool

    Google has developed a tool called Magika, which they are sharing with the cybersecurity community. Magika specializes in identifying malware, which refers to software designed to compromise or infiltrate systems. This tool enables security experts to distinguish between safe and harmful files, facilitating the early detection of potentially harmful software.

    A Collaborative Approach to Cybersecurity

    Google emphasizes that combating cyber threats requires a collective effort. They urge companies and governments to collaborate, share information, and leverage AI to enhance internet security for all users. By utilizing AI not just to respond to cyber threats but to prevent them, Google is taking a significant step towards creating a safer online environment. This initiative demonstrates the positive potential of AI in safeguarding the internet.

  • Nvidia Introduces Eos Supercomputer, Advancing the Frontiers of Artificial Intelligence

    Nvidia Introduces Eos Supercomputer, Advancing the Frontiers of Artificial Intelligence

    Nvidia has unveiled Eos, a revolutionary supercomputer for data centers, at the Supercomputing 2023 trade show. This supercomputer, known as an “AI factory,” is designed to push the boundaries of artificial intelligence development. Eos represents a new era in AI acceleration and has been named after the Greek goddess of dawn.

    Impressive Performance

    Eos is powered by 576 Nvidia DGX H100 systems, which are integrated with Quantum-2 InfiniBand networking and specialized software. This impressive setup enables Eos to achieve a remarkable 18.4 exaflops of FP8 AI performance. It is a significant advancement from Nvidia’s previous supercomputing projects, SaturnV and Selene, showcasing the advanced DGX SuperPOD architecture. This architecture allows for the rapid scaling of AI data center solutions to meet high-performance demands.

    Hardware Configuration

    At the core of Eos are 4,608 H100 GPUs, distributed across each DGX H100 system’s eight H100 Tensor Core APUs. This hardware configuration is specifically designed to handle extensive workloads, including training large language models, running AI recommenders, conducting large-scale analytics, performing quantum simulations, and more.

    Optimized for AI Tasks

    Eos’s architecture is finely tuned for AI tasks that require ultra-low latency and high throughput in massive computing clusters. The supercomputer’s networking capabilities, with speeds reaching up to 400GB/s, are crucial for handling the large datasets necessary for training AI models.

    Specialized Software Integration

    Eos also integrates specialized software to enhance AI development and deployment. Base Command facilitates AI workflow, cluster management, and provides libraries for compute, storage, and network acceleration. AI Enterprise, a cloud-native platform, aims to expedite AI application development and positions itself as the “operating system” for enterprise-level AI. Eos’s capabilities have earned it the ninth position on the TOP500 list of the world’s fastest supercomputers.