Tag: Google DeepMind

  • Google Gemini 3.1 Pro Launches with Major Reasoning Upgrade

    Google Gemini 3.1 Pro Launches with Major Reasoning Upgrade

    Key Takeaways

    1. Google launched Gemini 3.1 Pro as an enhanced intelligence model that improves upon Gemini 3 Deep Think for complex reasoning tasks.
    2. Gemini 3.1 Pro achieved a verified score of 77.1% on the ARC-AGI-2 benchmark, significantly outperforming Gemini 3 Pro’s score of 31.1%.
    3. The model is designed for real-world applications, enhancing advanced reasoning in various workflows.
    4. Gemini 3.1 Pro is currently in preview, with plans for general availability and increased user limits for Google AI Pro and Ultra subscribers.
    5. Detailed specifications, such as pricing and availability, are not provided in the announcement; developers are directed to the official Gemini API and Vertex AI documentation for more information.


    Google has announced the launch of Gemini 3.1 Pro, which it describes as the “enhanced core intelligence” that supports the recent update of Gemini 3 Deep Think. This new version is aimed at providing a more intelligent foundation for situations where a straightforward answer isn’t sufficient.

    Impressive Performance Metrics

    The standout claim from Google about Gemini 3.1 Pro is its advancement on the ARC-AGI-2 benchmark, which is designed to assess a model’s capacity to tackle completely new logic patterns. According to Google, Gemini 3.1 Pro has attained a verified score of 77.1% on the ARC-AGI-2 test, which they state is “more than double” the reasoning capabilities of Gemini 3 Pro.

    To put this into perspective, the official benchmark table from Google DeepMind shows that Gemini 3 Pro scored 31.1% on ARC-AGI-2 (ARC Prize Verified).

    Targeting Real-World Applications

    Google presents Gemini 3.1 Pro as a model intended for implementing advanced reasoning in actual workflows, such as:

    Gemini 3.1 Pro is being released in preview, allowing Google to confirm updates before it becomes “generally available soon.” On the consumer front, Google notes that the Gemini app will feature Gemini 3.1 Pro with increased limits for users subscribed to Google AI Pro and Ultra plans, while access to NotebookLM is reserved exclusively for Pro and Ultra users.

    Details and Documentation

    While Google’s announcement highlights the rollout and key performance metrics, it does not provide a comprehensive specification sheet that includes information like pricing, availability in various countries, or specific names for API models. For these precise details, Google recommends developers and businesses consult the official Gemini API and Vertex AI documentation.

    Source:
    Link


     

  • Google Launches Gemini 2.5 Nano-Banana Flash Image for Better Edits

    Google Launches Gemini 2.5 Nano-Banana Flash Image for Better Edits

    Key Takeaways

    1. Character Consistency: Gemini 2.5 Flash Image maintains character appearance across various scenes, regardless of outfit or environment changes.

    2. Versatile Editing Capabilities: Users can merge images, apply natural language commands for modifications, and create multi-turn edits for continuous adjustments.

    3. Clear Pricing Structure: The cost for developers is $30 per million output tokens, with each image counted as 1,290 tokens, approximately $0.039 per image.

    4. Safety Features: Generated images contain a visible AI mark and an invisible SynthID digital watermark for verification and authenticity.

    5. Enhanced Image Quality: Initial previews rate Gemini 2.5 as a top-tier editing solution, preserving details and allowing for diverse creative applications, including video creation.


    Google DeepMind has introduced the Gemini 2.5 Flash Image, nicknamed “nano-banana,” designed for both the Gemini app and developers via the Gemini API, Google AI Studio, and Vertex AI. This update aims to resolve a common issue with AI image tools that often lead to small tweaks resulting in drastic changes to the entire image. Google claims that this version offers enhanced quality and control compared to its predecessors.

    Key Features of Gemini 2.5

    A standout feature of this release is its character consistency. Users can maintain the look of a person, pet, or product across various scenes, regardless of changes in outfits, hairstyles, time periods, or environments. The model can merge multiple images into a single one, implement specific modifications using natural language commands, and leverage Gemini’s extensive knowledge during both image creation and editing.

    Versatile Uses for Creators

    This tool enables users to position the same character in diverse settings, display a product from multiple perspectives, or ensure brand imagery remains uniform throughout marketing campaigns. The multi-turn editing function allows for continuous adjustments, like adding furniture and decor to create different room styles. You can also combine designs, transfer patterns from one image to another object, or integrate a person and a pet into a fresh scene.

    The pricing structure is clear for developers: Gemini 2.5 Flash Image is priced at $30 for every million output tokens. Each image is considered as 1,290 output tokens, which equals about $0.039 per image. Other input and output types adhere to the usual pricing for Gemini 2.5 Flash.

    Safety and Verification Features

    To ensure safety, all generated images feature a visible AI mark and an invisible SynthID digital watermark. Google asserts that SynthID remains detectable even after typical edits, which can aid in confirming the origins of images as synthetic media becomes increasingly challenging to identify.

    Google indicates that initial previews rate this model as a top-tier image editing solution. The built-in editing features of the Gemini app now preserve subtle details in your pictures. Users can upload images, request modifications, blend images with their pets, change backgrounds to try out new wallpapers, or insert themselves into various scenes. Furthermore, the edited image can be used in Gemini to create a short video.

    Source:
    Link


     

  • Google DeepMind Genie 2: Real-Time 3D World Generator

    Google DeepMind Genie 2: Real-Time 3D World Generator

    Google DeepMind, a research branch of Google focused on AI, has introduced Genie 2, a foundational world model capable of creating "action-controllable, playable 3D environments" for fast prototyping and training AI agents.

    Advanced Capabilities

    According to the company, Genie 2 enhances the abilities of its earlier version and can produce "a vast diversity of rich 3D worlds." It’s capable of simulating interactions between objects, animations for characters, physics, and Non-Playable Characters (NPCs) along with their animations and interactions. This model can take both text and visual cues as input.

    Memory and Perspective

    Genie 2 is designed to remember elements of the world that aren’t visible to the player and can render them when they become visible again. This is akin to the Level of Detail (LOD) technique used in gaming, which adjusts the complexity of the objects and environments based on the player’s Field Of View (FOV).

    The model can create new content in real-time and keep a stable world "for up to a minute." It also offers the ability to render environments from various viewpoints, such as first-person, third-person, or isometric perspectives.

    Realistic Effects

    Additionally, it can produce sophisticated effects, including smoke, object interactions, fluid dynamics, gravity, and advanced lighting and reflections. DeepMind claims this model can facilitate the quick prototyping of fresh concepts and ideas. Users can also create and manage AI agents with straightforward prompts.

    Numerous companies are developing foundational world models that can simulate and build representations of environments. For instance, Decart’s Oasis allows users to engage with a real-time AI-generated version of Minecraft, while AI leader Fei Fei Li’s start-up, World Labs, also features a 3D generator.

    Google DeepMind’s contributions are setting a new standard in the realm of AI and simulated environments.

  • Google DeepMind AI Creates Music and Sound for Silent Videos

    Google DeepMind AI Creates Music and Sound for Silent Videos

    Google's DeepMind has unveiled a new AI tool capable of generating background music and sound effects for silent videos. This "video-to-audio" system aims to simplify the video editing process, especially for content creators.

    Currently under development, this technology offers some intriguing capabilities. Here’s an overview of the process:

    User Input

    Creators start by uploading their silent video and can include keywords or phrases to guide the AI in producing the appropriate soundscape. For instance, a silent video featuring someone walking in the dark might benefit from prompts such as “movies, horror films, music, tension, footsteps on concrete” to help the AI grasp the mood and context.

    AI in Action

    DeepMind’s AI model begins by breaking down the video to analyze its visuals. This visual data is then paired with the user-provided text prompts. Through a diffusion model, the AI processes this combined information iteratively, eventually creating background sounds that match the video content.

    Tailoring the Soundscape

    The model can generate different audio options for a single video, allowing creators to select the best match for their project. DeepMind’s system can also take into account the emotional tone of the prompt words. For example, prompts that emphasize “tension” might produce suspenseful background music, whereas prompts like “joyful celebration” could result in more upbeat sounds.

    Looking forward, DeepMind is continuously refining this technology. Future plans include enabling the AI to generate sounds automatically based solely on the video content, eliminating the need for user prompts. Additionally, they aim to enhance the system’s ability to synchronize generated dialogue with the characters’ lip movements in the video.

    This "video-to-audio" technology has the potential to transform video editing, particularly for creators who do not have access to professional audio tools or expertise.

  • Google DeepMind’s SIMA: Your New In-Game Teammate Training Guide

    Google DeepMind’s SIMA: Your New In-Game Teammate Training Guide

    Get ready for a novel gaming companion! Google DeepMind has unveiled SIMA, a sophisticated language model undergoing training to serve as your in-game partner. Is this the true purpose of AI? It seems like a fitting role.

    SIMA, short for “Scalable, Instructable, Multiworld Agent,” is currently in the developmental stages but holds the promise of transforming our gaming experiences. Unlike conventional AI companions, SIMA transcends the typical NPC character archetype. This model is crafted to be a collaborative teammate, capable of comprehending your actions and adjusting its own responses accordingly. Envision having a cooperative partner in Borderlands who allows you to loot first before claiming items themselves. The prospect is undeniably exciting.

    AI Gaming Evolution

    To achieve this level of interaction, SIMA leverages a blend of natural language processing and image recognition. This fusion enables it to perceive the 3D gaming environment and react to your commands and movements. Google has collaborated with eight game developers, including studios responsible for popular titles like No Man’s Sky and Valheim, to train this AI teammate.

    Training and Future Prospects

    Through these partnerships, SIMA is mastering the basics of gameplay—from simple tasks like turning or climbing to navigating menus and maps. While more intricate activities such as resource collection and constructing camps currently lie beyond its capabilities, Google anticipates a significant expansion of SIMA’s skill set in the near future. Gamers may soon find themselves with a Google AI companion ready to fill that elusive third slot in their Apex Legends Lobby.

  • DeepMind AI Surpasses Boundaries: Unveils 2 Million Unimaginable New Materials

    DeepMind AI Surpasses Boundaries: Unveils 2 Million Unimaginable New Materials

    In a groundbreaking achievement, Google DeepMind, a subsidiary of Alphabet (GOOGL), has harnessed the power of artificial intelligence (AI) to predict the structures of over 2 million new materials. This milestone, published in the prestigious science journal Nature, holds the potential to reshape various industries by significantly improving the production of batteries, solar panels, and computer chips.

    Revolutionizing Material Discovery with AI

    Traditionally, the discovery and synthesis of new materials have been both costly and time-consuming, often spanning a decade or more. However, DeepMind’s innovative AI, trained on data from the Materials Project, an international research group founded in 2011, has managed to predict the structure of nearly 400,000 hypothetical material designs. This breakthrough is expected to shorten the typically lengthy 10 to 20-year timeline for material development.

    The Power of GNoME Deep Learning Tool

    The GNoME Deep Learning Tool, a crucial component of DeepMind’s arsenal, identified an astounding 2.2 million new inorganic crystals, with 380,000 identified as the most stable for experimental research. This predictive accuracy extends to the stability of crystal structures, exemplified by the discovery of 52,000 new layered compounds similar to graphene and 528 potential lithium-ion conductors, a remarkable 25 times more than previous studies.

    Sharing Data for Further Advancements

    To facilitate further advancements, DeepMind is taking a collaborative approach by sharing its vast dataset with the research community through the Next Gen Materials Project. The data encompasses all of GNoME’s discoveries and predictions, providing researchers with free access to explore and experiment with the newfound treasure trove of material structures.

    Transforming Material Synthesis with AI

    A significant leap towards efficient material synthesis has been made possible through DeepMind’s collaboration with the Berkeley Lab, resulting in the creation of a robotic laboratory. This autonomous lab successfully synthesized 41 of the newly discovered materials, demonstrating the transformative potential of AI in experimental synthesis.

    Ekin Dogus Cubuk, a research scientist at DeepMind, expressed hope for substantial improvements in experimentation, autonomous synthesis, and machine learning models. The shared optimism extends to Kristin Persson, director of the Materials Project, who emphasized the need to shrink timelines for material development. Despite industry tendencies to be cautious about cost increases, the collaborative breakthroughs driven by AI may reshape dynamics, potentially reducing the time it takes for new materials to become cost-effective.

    The Future of AI and Material Science

    The convergence of AI and material science showcased by DeepMind’s achievement represents a radical acceleration in technological development. The impact on industries such as energy storage, electronics, and beyond is poised to unlock untold paths of innovation.